942 resultados para random forest data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Citizen Science projects are initiatives in which members of the general public participate in scientific research projects and perform or manage research-related tasks such as data collection and/or data annotation. Citizen Science is technologically possible and scientifically significant. However, although research teams can save time and money by recruiting general citizens to volunteer their time and skills to help data analysis, the reliability of contributed data varies a lot. Data reliability issues are significant to the domain of Citizen Science due to the quantity and diversity of people and devices involved. Participants may submit low quality, misleading, inaccurate, or even malicious data. Therefore, finding a way to improve the data reliability has become an urgent demand. This study aims to investigate techniques to enhance the reliability of data contributed by general citizens in scientific research projects especially for acoustic sensing projects. In particular, we propose to design a reputation framework to enhance data reliability and also investigate some critical elements that should be aware of during developing and designing new reputation systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes an innovative platform that facilitates the collection of objective safety data around occurrences at railway level crossings using data sources including forward-facing video, telemetry from trains and geo-referenced asset and survey data. This platform is being developed with support by the Australian rail industry and the Cooperative Research Centre for Rail Innovation. The paper provides a description of the underlying accident causation model, the development methodology and refinement process as well as a description of the data collection platform. The paper concludes with a brief discussion of benefits this project is expected to provide the Australian rail industry.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Summary: More than ever before contemporary societies are characterised by the huge amounts of data being transferred. Authorities, companies, academia and other stakeholders refer to Big Data when discussing the importance of large and complex datasets and developing possible solutions for their use. Big Data promises to be the next frontier of innovation for institutions and individuals, yet it also offers possibilities to predict and influence human behaviour with ever-greater precision

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Australia, as in some other western nations, governments impose accountability measures on educational institutions (Earl, 2005). One such accountability measure is the National Assessment Program - Literacy and Numeracy (NAPLAN) from which high-stakes assessment data is generated. In this article, a practical method of data analysis known as the Over Time Assessment Data Analysis (OTADA) is offered as an analytical process by which schools can monitor their current and over time performances. This analysis developed by the author, is currently used extensively in schools throughout Queensland. By Analysing in this way, teachers, and in particular principals, can obtain a quick and insightful performance overview. For those seeking to track the achievements and progress of year level cohorts, the OTADA should be considered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To the Editor—In a recent review article in Infection Control and Hospital Epidemiology, Umscheid et al1 summarized published data on incidence rates of catheter-associated bloodstream infection (CABSI), catheter-associated urinary tract infection (CAUTI), surgical site infection (SSI), and ventilator- associated pneumonia (VAP); estimated how many cases are preventable; and calculated the savings in hospital costs and lives that would result from preventing all preventable cases. Providing these estimates to policy makers, political leaders, and health officials helps to galvanize their support for infection prevention programs. Our concern is that important limitations of the published studies on which Umscheid and colleagues built their findings are incompletely addressed in this review. More attention needs to be drawn to the techniques applied to generate these estimates...

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many techniques in information retrieval produce counts from a sample, and it is common to analyse these counts as proportions of the whole - term frequencies are a familiar example. Proportions carry only relative information and are not free to vary independently of one another: for the proportion of one term to increase, one or more others must decrease. These constraints are hallmarks of compositional data. While there has long been discussion in other fields of how such data should be analysed, to our knowledge, Compositional Data Analysis (CoDA) has not been considered in IR. In this work we explore compositional data in IR through the lens of distance measures, and demonstrate that common measures, naïve to compositions, have some undesirable properties which can be avoided with composition-aware measures. As a practical example, these measures are shown to improve clustering. Copyright 2014 ACM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Critical to the research of urban morphologists is the availability of historical records that document the urban transformation of the study area. However, thus far little work has been done towards an empirical approach to the validation of archival data in this field. Outlined in this paper, therefore, is a new methodology for validating the accuracy of archival records and mapping data, accrued through the process of urban morphological research, so as to establish a reliable platform from which analysis can proceed. The paper particularly addresses the problems of inaccuracies in existing curated historical information, as well as errors in archival research by student assistants, which together give rise to unacceptable levels of uncertainty in the documentation. The paper discusses the problems relating to the reliability of historical information, demonstrates the importance of data verification in urban morphological research, and proposes a rigorous method for objective testing of collected archival data through the use of qualitative data analysis software.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The development of microfinance in Vietnam since 1990s has coincided with a remarkable progress in poverty reduction. Numerous descriptive studies have illustrated that microfinance is an effective tool to eradicate poverty in Vietnam but evidence from quantitative studies is mixed. This study contributes to the literature by providing new evidence on the impact of microfinance to poverty reduction in Vietnam using the repeated cross - sectional data from the Vietnam Living Standard s Survey (VLSS) during period 1992 - 2010. Our results show that micro - loans contribute significantly to household consumption.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most real-life data analysis problems are difficult to solve using exact methods, due to the size of the datasets and the nature of the underlying mechanisms of the system under investigation. As datasets grow even larger, finding the balance between the quality of the approximation and the computing time of the heuristic becomes non-trivial. One solution is to consider parallel methods, and to use the increased computational power to perform a deeper exploration of the solution space in a similar time. It is, however, difficult to estimate a priori whether parallelisation will provide the expected improvement. In this paper we consider a well-known method, genetic algorithms, and evaluate on two distinct problem types the behaviour of the classic and parallel implementations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is a current lack of understanding regarding the use of unregistered vehicles on public roads and road-related areas, and the links between the driving of unregistered vehicles and a range of dangerous driving behaviours. This report documents the findings of data analysis conducted to investigate the links between unlicensed driving and the driving of unregistered vehicles, and is an important initial undertaking into understanding these behaviours. This report examines de-identified data from two sources: crash data; and offence data. The data was extracted from the Queensland Department of Transport and Main Roads (TMR) databases and covered the period from 2003 to 2008.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big Datasets are endemic, but they are often notoriously difficult to analyse because of their size, heterogeneity, history and quality. The purpose of this paper is to open a discourse on the use of modern experimental design methods to analyse Big Data in order to answer particular questions of interest. By appealing to a range of examples, it is suggested that this perspective on Big Data modelling and analysis has wide generality and advantageous inferential and computational properties. In particular, the principled experimental design approach is shown to provide a flexible framework for analysis that, for certain classes of objectives and utility functions, delivers near equivalent answers compared with analyses of the full dataset under a controlled error rate. It can also provide a formalised method for iterative parameter estimation, model checking, identification of data gaps and evaluation of data quality. Finally, it has the potential to add value to other Big Data sampling algorithms, in particular divide-and-conquer strategies, by determining efficient sub-samples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Selection criteria and misspecification tests for the intra-cluster correlation structure (ICS) in longitudinal data analysis are considered. In particular, the asymptotical distribution of the correlation information criterion (CIC) is derived and a new method for selecting a working ICS is proposed by standardizing the selection criterion as the p-value. The CIC test is found to be powerful in detecting misspecification of the working ICS structures, while with respect to the working ICS selection, the standardized CIC test is also shown to have satisfactory performance. Some simulation studies and applications to two real longitudinal datasets are made to illustrate how these criteria and tests might be useful.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A modeling paradigm is proposed for covariate, variance and working correlation structure selection for longitudinal data analysis. Appropriate selection of covariates is pertinent to correct variance modeling and selecting the appropriate covariates and variance function is vital to correlation structure selection. This leads to a stepwise model selection procedure that deploys a combination of different model selection criteria. Although these criteria find a common theoretical root based on approximating the Kullback-Leibler distance, they are designed to address different aspects of model selection and have different merits and limitations. For example, the extended quasi-likelihood information criterion (EQIC) with a covariance penalty performs well for covariate selection even when the working variance function is misspecified, but EQIC contains little information on correlation structures. The proposed model selection strategies are outlined and a Monte Carlo assessment of their finite sample properties is reported. Two longitudinal studies are used for illustration.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The method of generalized estimating equations (GEEs) provides consistent estimates of the regression parameters in a marginal regression model for longitudinal data, even when the working correlation model is misspecified (Liang and Zeger, 1986). However, the efficiency of a GEE estimate can be seriously affected by the choice of the working correlation model. This study addresses this problem by proposing a hybrid method that combines multiple GEEs based on different working correlation models, using the empirical likelihood method (Qin and Lawless, 1994). Analyses show that this hybrid method is more efficient than a GEE using a misspecified working correlation model. Furthermore, if one of the working correlation structures correctly models the within-subject correlations, then this hybrid method provides the most efficient parameter estimates. In simulations, the hybrid method's finite-sample performance is superior to a GEE under any of the commonly used working correlation models and is almost fully efficient in all scenarios studied. The hybrid method is illustrated using data from a longitudinal study of the respiratory infection rates in 275 Indonesian children.