957 resultados para Imbalanced datasets


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The challenges of maintaining a building such as the Sydney Opera House are immense and are dependent upon a vast array of information. The value of information can be enhanced by its currency, accessibility and the ability to correlate data sets (integration of information sources). A building information model correlated to various information sources related to the facility is used as definition for a digital facility model. Such a digital facility model would give transparent and an integrated access to an array of datasets and obviously would support Facility Management processes. In order to construct such a digital facility model, two state-of-the-art Information and Communication technologies are considered: an internationally standardized building information model called the Industry Foundation Classes (IFC) and a variety of advanced communication and integration technologies often referred to as the Semantic Web such as the Resource Description Framework (RDF) and the Web Ontology Language (OWL). This paper reports on some technical aspects for developing a digital facility model focusing on Sydney Opera House. The proposed digital facility model enables IFC data to participate in an ontology driven, service-oriented software environment. A proof-of-concept prototype has been developed demonstrating the usability of IFC information to collaborate with Sydney Opera House’s specific data sources using semantic web ontologies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Google Online Marketing Challenge is an ongoing collaboration between Google and academics, to give students experiential learning. The Challenge gives student teams US$200 in AdWords, Google’s flagship advertising product, to develop online marketing campaigns for actual businesses. The end result is an engaging in-class exercise that provides students and professors with an exciting and pedagogically rigorous competition. Results from surveys at the end of the Challenge reveal positive appraisals from the three—students, businesses, and professors—main constituents; general agreement between students and instructors regarding learning outcomes; and a few points of difference between students and instructors. In addition to describing the Challenge and its outcomes, this article reviews the postparticipation questionnaires and subsequent datasets. The questionnaires and results are publicly available, and this article invites educators to mine the datasets, share their results, and offer suggestions for future iterations of the Challenge.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Automatic detection of suspicious activities in CCTV camera feeds is crucial to the success of video surveillance systems. Such a capability can help transform the dumb CCTV cameras into smart surveillance tools for fighting crime and terror. Learning and classification of basic human actions is a precursor to detecting suspicious activities. Most of the current approaches rely on a non-realistic assumption that a complete dataset of normal human actions is available. This paper presents a different approach to deal with the problem of understanding human actions in video when no prior information is available. This is achieved by working with an incomplete dataset of basic actions which are continuously updated. Initially, all video segments are represented by Bags-Of-Words (BOW) method using only Term Frequency-Inverse Document Frequency (TF-IDF) features. Then, a data-stream clustering algorithm is applied for updating the system's knowledge from the incoming video feeds. Finally, all the actions are classified into different sets. Experiments and comparisons are conducted on the well known Weizmann and KTH datasets to show the efficacy of the proposed approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Over the last few decades, most large cities in the developing world have been experiencing rapid and imbalanced transport sector development resulting in severe congestion and poor levels of service. The most common response at a policy level under this circumstance has been to focus on private and public motorized transport modes, and especially on traffic control measures and mass transit systems. Despite their major role in the overall transport system in many developing cities in Asia & Latin America, relatively little attention is given to non-motorized transport (NMT) modes (walk, bicycle and cycle-rickshaw). In particular, this ideology is applicable to the paid category of non-motorized public transport (NMPT), notably three-wheeler cycle rickshaws that still have an important socio-economic, environmental and trip-making role in many developing cities. Despite, they are often seen as inefficient and backward; an impediment to progress; and inconsistent with modern urban image. Policy measures therefore, to restrict or eliminate non-motorized transport from urban arterials and other feeder networks have been implemented in cities as diverse as Dhaka, Delhi, Karachi, Bangkok, Jakarta, Manila, Surabaya and Beijing . This paper will primarily investigate the key contribution of NMPT in the sustainable transport system and urban fabric of developing cities, with Dhaka as case study. The paper will also highlight in detail the impediments towards NMPT development and provide introductory concept on possible role this mode is expected to play into the future of these cities

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Research has noted a ‘pronounced pattern of increase with increasing remoteness' of death rates in road crashes. However, crash characteristics by remoteness are not commonly or consistently reported, with definitions of rural and urban often relying on proxy representations such as prevailing speed limit. The current paper seeks to evaluate the efficacy of the Accessibility / Remoteness Index of Australia (ARIA+) to identifying trends in road crashes. ARIA+ does not rely on road-specific measures and uses distances to populated centres to attribute a score to an area, which can in turn be grouped into 5 classifications of increasing remoteness. The current paper uses applications of these classifications at the broad level of Australian Bureau of Statistics' Statistical Local Areas, thus avoiding precise crash locating or dedicated mapping software. Analyses used Queensland road crash database details for all 31,346 crashes resulting in a fatality or hospitalisation occurring between 1st July, 2001 and 30th June 2006 inclusive. Results showed that this simplified application of ARIA+ aligned with previous definitions such as speed limit, while also providing further delineation. Differences in crash contributing factors were noted with increasing remoteness such as a greater representation of alcohol and ‘excessive speed for circumstances.' Other factors such as the predominance of younger drivers in crashes differed little by remoteness classification. The results are discussed in terms of the utility of remoteness as a graduated rather than binary (rural/urban) construct and the potential for combining ARIA crash data with census and hospital datasets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background and Objective: As global warming continues, the frequency, intensity and duration of heatwaves are likely to increase. However, a heatwave is unlikely to be defined uniformly because acclimatisation plays a significant role in determining the heat-related impact. This study investigated how to best define a heatwave in Brisbane, Australia. Methods: Computerised datasets on daily weather, air pollution and health outcomes between 1996 and 2005 were obtained from pertinent government agencies. Paired t-tests and case-crossover analyses were performed to assess the relationship between heatwaves and health outcomes using different heatwave definitions. Results: The maximum temperature was as high as 41.5°C with a mean maximum daily temperature of 26.3°C. None of the five commonly-used heatwave definitions suited Brisbane well on the basis of the health effects of heatwaves. Additionally, there were pros and cons when locally-defined definitions were attempted using either a relative or absolute definition for extreme temperatures. Conclusion: The issue of how to best define a heatwave is complex. It is important to identify an appropriate definition of heatwave locally and to understand its health effects.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objective: To summarise the extent to which narrative text fields in administrative health data are used to gather information about the event resulting in presentation to a health care provider for treatment of an injury, and to highlight best practise approaches to conducting narrative text interrogation for injury surveillance purposes.----- Design: Systematic review----- Data sources: Electronic databases searched included CINAHL, Google Scholar, Medline, Proquest, PubMed and PubMed Central.. Snowballing strategies were employed by searching the bibliographies of retrieved references to identify relevant associated articles.----- Selection criteria: Papers were selected if the study used a health-related database and if the study objectives were to a) use text field to identify injury cases or use text fields to extract additional information on injury circumstances not available from coded data or b) use text fields to assess accuracy of coded data fields for injury-related cases or c) describe methods/approaches for extracting injury information from text fields.----- Methods: The papers identified through the search were independently screened by two authors for inclusion, resulting in 41 papers selected for review. Due to heterogeneity between studies metaanalysis was not performed.----- Results: The majority of papers reviewed focused on describing injury epidemiology trends using coded data and text fields to supplement coded data (28 papers), with these studies demonstrating the value of text data for providing more specific information beyond what had been coded to enable case selection or provide circumstantial information. Caveats were expressed in terms of the consistency and completeness of recording of text information resulting in underestimates when using these data. Four coding validation papers were reviewed with these studies showing the utility of text data for validating and checking the accuracy of coded data. Seven studies (9 papers) described methods for interrogating injury text fields for systematic extraction of information, with a combination of manual and semi-automated methods used to refine and develop algorithms for extraction and classification of coded data from text. Quality assurance approaches to assessing the robustness of the methods for extracting text data was only discussed in 8 of the epidemiology papers, and 1 of the coding validation papers. All of the text interrogation methodology papers described systematic approaches to ensuring the quality of the approach.----- Conclusions: Manual review and coding approaches, text search methods, and statistical tools have been utilised to extract data from narrative text and translate it into useable, detailed injury event information. These techniques can and have been applied to administrative datasets to identify specific injury types and add value to previously coded injury datasets. Only a few studies thoroughly described the methods which were used for text mining and less than half of the studies which were reviewed used/described quality assurance methods for ensuring the robustness of the approach. New techniques utilising semi-automated computerised approaches and Bayesian/clustering statistical methods offer the potential to further develop and standardise the analysis of narrative text for injury surveillance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The problem of impostor dataset selection for GMM-based speaker verification is addressed through the recently proposed data-driven background dataset refinement technique. The SVM-based refinement technique selects from a candidate impostor dataset those examples that are most frequently selected as support vectors when training a set of SVMs on a development corpus. This study demonstrates the versatility of dataset refinement in the task of selecting suitable impostor datasets for use in GMM-based speaker verification. The use of refined Z- and T-norm datasets provided performance gains of 15% in EER in the NIST 2006 SRE over the use of heuristically selected datasets. The refined datasets were shown to generalise well to the unseen data of the NIST 2008 SRE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A data-driven background dataset refinement technique was recently proposed for SVM based speaker verification. This method selects a refined SVM background dataset from a set of candidate impostor examples after individually ranking examples by their relevance. This paper extends this technique to the refinement of the T-norm dataset for SVM-based speaker verification. The independent refinement of the background and T-norm datasets provides a means of investigating the sensitivity of SVM-based speaker verification performance to the selection of each of these datasets. Using refined datasets provided improvements of 13% in min. DCF and 9% in EER over the full set of impostor examples on the 2006 SRE corpus with the majority of these gains due to refinement of the T-norm dataset. Similar trends were observed for the unseen data of the NIST 2008 SRE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This work presents an extended Joint Factor Analysis model including explicit modelling of unwanted within-session variability. The goals of the proposed extended JFA model are to improve verification performance with short utterances by compensating for the effects of limited or imbalanced phonetic coverage, and to produce a flexible JFA model that is effective over a wide range of utterance lengths without adjusting model parameters such as retraining session subspaces. Experimental results on the 2006 NIST SRE corpus demonstrate the flexibility of the proposed model by providing competitive results over a wide range of utterance lengths without retraining and also yielding modest improvements in a number of conditions over current state-of-the-art.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

XML document clustering is essential for many document handling applications such as information storage, retrieval, integration and transformation. An XML clustering algorithm should process both the structural and the content information of XML documents in order to improve the accuracy and meaning of the clustering solution. However, the inclusion of both kinds of information in the clustering process results in a huge overhead for the underlying clustering algorithm because of the high dimensionality of the data. This paper introduces a novel approach that first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. The proposed method reduces the high dimensionality of input data by using only the structure-constrained content. The empirical analysis reveals that the proposed method can effectively cluster even very large XML datasets and outperform other existing methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Association rule mining is one technique that is widely used when querying databases, especially those that are transactional, in order to obtain useful associations or correlations among sets of items. Much work has been done focusing on efficiency, effectiveness and redundancy. There has also been a focusing on the quality of rules from single level datasets with many interestingness measures proposed. However, with multi-level datasets now being common there is a lack of interestingness measures developed for multi-level and cross-level rules. Single level measures do not take into account the hierarchy found in a multi-level dataset. This leaves the Support-Confidence approach,which does not consider the hierarchy anyway and has other drawbacks, as one of the few measures available. In this paper we propose two approaches which measure multi-level association rules to help evaluate their interestingness. These measures of diversity and peculiarity can be used to help identify those rules from multi-level datasets that are potentially useful.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objective We aimed to predict sub-national spatial variation in numbers of people infected with Schistosoma haematobium, and associated uncertainties, in Burkina Faso, Mali and Niger, prior to implementation of national control programmes. Methods We used national field survey datasets covering a contiguous area 2,750 × 850 km, from 26,790 school-aged children (5–14 years) in 418 schools. Bayesian geostatistical models were used to predict prevalence of high and low intensity infections and associated 95% credible intervals (CrI). Numbers infected were determined by multiplying predicted prevalence by numbers of school-aged children in 1 km2 pixels covering the study area. Findings Numbers of school-aged children with low-intensity infections were: 433,268 in Burkina Faso, 872,328 in Mali and 580,286 in Niger. Numbers with high-intensity infections were: 416,009 in Burkina Faso, 511,845 in Mali and 254,150 in Niger. 95% CrIs (indicative of uncertainty) were wide; e.g. the mean number of boys aged 10–14 years infected in Mali was 140,200 (95% CrI 6200, 512,100). Conclusion National aggregate estimates for numbers infected mask important local variation, e.g. most S. haematobium infections in Niger occur in the Niger River valley. Prevalence of high-intensity infections was strongly clustered in foci in western and central Mali, north-eastern and northwestern Burkina Faso and the Niger River valley in Niger. Populations in these foci are likely to carry the bulk of the urinary schistosomiasis burden and should receive priority for schistosomiasis control. Uncertainties in predicted prevalence and numbers infected should be acknowledged and taken into consideration by control programme planners.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The study reported here, constitutes a full review of the major geological events that have influenced the morphological development of the southeast Queensland region. Most importantly, it provides evidence that the region’s physiography continues to be geologically ‘active’ and although earthquakes are presently few and of low magnitude, many past events and tectonic regimes continue to be strongly influential over drainage, morphology and topography. Southeast Queensland is typified by highland terrain of metasedimentary and igneous rocks that are parallel and close to younger, lowland coastal terrain. The region is currently situated in a passive margin tectonic setting that is now under compressive stress, although in the past, the region was subject to alternating extensional and compressive regimes. As part of the investigation, the effects of many past geological events upon landscape morphology have been assessed at multiple scales using features such as the location and orientation of drainage channels, topography, faults, fractures, scarps, cleavage, volcanic centres and deposits, and recent earthquake activity. A number of hypotheses for local geological evolution are proposed and discussed. This study has also utilised a geographic information system (GIS) approach that successfully amalgamates the various types and scales of datasets used. A new method of stream ordination has been developed and is used to compare the orientation of channels of similar orders with rock fabric, in a topologically controlled approach that other ordering systems are unable to achieve. Stream pattern analysis has been performed and the results provide evidence that many drainage systems in southeast Queensland are controlled by known geological structures and by past geological events. The results conclude that drainage at a fine scale is controlled by cleavage, joints and faults, and at a broader scale, large river valleys, such as those of the Brisbane River and North Pine River, closely follow the location of faults. These rivers appear to have become entrenched by differential weathering along these planes of weakness. Significantly, stream pattern analysis has also identified some ‘anomalous’ drainage that suggests the orientations of these watercourses are geologically controlled, but by unknown causes. To the north of Brisbane, a ‘coastal drainage divide’ has been recognized and is described here. The divide crosses several lithological units of different age, continues parallel to the coast and prevents drainage from the highlands flowing directly to the coast for its entire length. Diversion of low order streams away from the divide may be evidence that a more recent process may be the driving force. Although there is no conclusive evidence for this at present, it is postulated that the divide may have been generated by uplift or doming associated with mid-Cenozoic volcanism or a blind thrust at depth. Also north of Brisbane, on the D’Aguilar Range, an elevated valley (the ‘Kilcoy Gap’) has been identified that may have once drained towards the coast and now displays reversed drainage that may have resulted from uplift along the coastal drainage divide and of the D’Aguilar blocks. An assessment of the distribution and intensity of recent earthquakes in the region indicates that activity may be associated with ancient faults. However, recent movement on these faults during these events would have been unlikely, given that earthquakes in the region are characteristically of low magnitude. There is, however, evidence that compressive stress is building and being released periodically and ancient faults may be a likely place for this stress to be released. The relationship between ancient fault systems and the Tweed Shield Volcano has also been discussed and it is suggested here that the volcanic activity was associated with renewed faulting on the Great Moreton Fault System during the Cenozoic. The geomorphology and drainage patterns of southeast Queensland have been compared with expected morphological characteristics found at passive and other tectonic settings, both in Australia and globally. Of note are the comparisons with the East Brazilian Highlands, the Gulf of Mexico and the Blue Ridge Escarpment, for example. In conclusion, the results of the study clearly show that, although the region is described as a passive margin, its complex, past geological history and present compressive stress regime provide a more intricate and varied landscape than would be expected along typical passive continental margins. The literature review provides background to the subject and discusses previous work and methods, whilst the findings are presented in three peer-reviewed, published papers. The methods, hypotheses, suggestions and evidence are discussed at length in the final chapter.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Precise, up-to-date and increasingly detailed road maps are crucial for various advanced road applications, such as lane-level vehicle navigation, and advanced driver assistant systems. With the very high resolution (VHR) imagery from digital airborne sources, it will greatly facilitate the data acquisition, data collection and updates if the road details can be automatically extracted from the aerial images. In this paper, we proposed an effective approach to detect road lane information from aerial images with employment of the object-oriented image analysis method. Our proposed algorithm starts with constructing the DSM and true orthophotos from the stereo images. The road lane details are detected using an object-oriented rule based image classification approach. Due to the affection of other objects with similar spectral and geometrical attributes, the extracted road lanes are filtered with the road surface obtained by a progressive two-class decision classifier. The generated road network is evaluated using the datasets provided by Queensland department of Main Roads. The evaluation shows completeness values that range between 76% and 98% and correctness values that range between 82% and 97%.