Biblioteca Digital

826 resultados para longitudinal Poisson data

Data preprocessing for anomaly based network intrusion detection : a review

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data preprocessing is widely recognized as an important stage in anomaly detection. This paper reviews the data preprocessing techniques used by anomaly-based network intrusion detection systems (NIDS), concentrating on which aspects of the network traffic are analyzed, and what feature construction and selection methods have been used. Motivation for the paper comes from the large impact data preprocessing has on the accuracy and capability of anomaly-based NIDS. The review finds that many NIDS limit their view of network traffic to the TCP/IP packet headers. Time-based statistics can be derived from these headers to detect network scans, network worm behavior, and denial of service attacks. A number of other NIDS perform deeper inspection of request packets to detect attacks against network services and network applications. More recent approaches analyze full service responses to detect attacks targeting clients. The review covers a wide range of NIDS, highlighting which classes of attack are detectable by each of these approaches. Data preprocessing is found to predominantly rely on expert domain knowledge for identifying the most relevant parts of network traffic and for constructing the initial candidate set of traffic features. On the other hand, automated methods have been widely used for feature extraction to reduce data dimensionality, and feature selection to find the most relevant subset of features from this candidate set. The review shows a trend toward deeper packet inspection to construct more relevant features through targeted content parsing. These context sensitive features are required to detect current attacks.

An economic evaluation of the Safe Motherhood programme in Guangxi, China

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Maternal and infant mortality is a global health issue with a significant social and economic impact. Each year, over half a million women worldwide die due to complications related to pregnancy or childbirth, four million infants die in the first 28 days of life, and eight million infants die in the first year. Ninety-nine percent of maternal and infant deaths are in developing countries. Reducing maternal and infant mortality is among the key international development goals. In China, the national maternal mortality ratio and infant mortality rate were reduced greatly in the past two decades, yet a large discrepancy remains between urban and rural areas. To address this problem, a large-scale Safe Motherhood Programme was initiated in 2000. The programme was implemented in Guangxi in 2003. Interventions in the programme included both demand-side and supply side-interventions focusing on increasing health service use and improving birth outcomes. Little is known about the effects and economic outcomes of the Safe Motherhood Programme in Guangxi, although it has been implemented for seven years. The aim of this research is to estimate the effectiveness and cost-effectiveness of the interventions in the Safe Motherhood Programme in Guangxi, China. The objectives of this research include: 1. To evaluate whether the changes of health service use and birth outcomes are associated with the interventions in the Safe Motherhood Programme. 2. To estimate the cost-effectiveness of the interventions in the Safe Motherhood Programme and quantify the uncertainty surrounding the decision. 3. To assess the expected value of perfect information associated with both the whole decision and individual parameters, and interpret the findings to inform priority setting in further research and policy making in this area. A quasi-experimental study design was used in this research to assess the effectiveness of the programme in increasing health service use and improving birth outcomes. The study subjects were 51 intervention counties and 30 control counties. Data on the health service use, birth outcomes and socio-economic factors from 2001 to 2007 were collected from the programme database and statistical yearbooks. Based on the profile plots of the data, general linear mixed models were used to evaluate the effectiveness of the programme while controlling for the effects of baseline levels of the response variables, change of socio-economic factors over time and correlations among repeated measurements from the same county. Redundant multicollinear variables were deleted from the mixed model using the results of the multicollinearity diagnoses. For each response variable, the best covariance structure was selected from 15 alternatives according to the fit statistics including Akaike information criterion, Finite-population corrected Akaike information criterion, and Schwarz.s Bayesian information criterion. Residual diagnostics were used to validate the model assumptions. Statistical inferences were made to show the effect of the programme on health service use and birth outcomes. A decision analytic model was developed to evaluate the cost-effectiveness of the programme, quantify the decision uncertainty, and estimate the expected value of perfect information associated with the decision. The model was used to describe the transitions between health states for women and infants and reflect the change of both costs and health benefits associated with implementing the programme. Result gained from the mixed models and other relevant evidence identified were synthesised appropriately to inform the input parameters of the model. Incremental cost-effectiveness ratios of the programme were calculated for the two groups of intervention counties over time. Uncertainty surrounding the parameters was dealt with using probabilistic sensitivity analysis, and uncertainty relating to model assumptions was handled using scenario analysis. Finally the expected value of perfect information for both the whole model and individual parameters in the model were estimated to inform priority setting in further research in this area.The annual change rates of the antenatal care rate and the institutionalised delivery rate were improved significantly in the intervention counties after the programme was implemented. Significant improvements were also found in the annual change rates of the maternal mortality ratio, the infant mortality rate, the incidence rate of neonatal tetanus and the mortality rate of neonatal tetanus in the intervention counties after the implementation of the programme. The annual change rate of the neonatal mortality rate was also improved, although the improvement was only close to statistical significance. The influences of the socio-economic factors on the health service use indicators and birth outcomes were identified. The rural income per capita had a significant positive impact on the health service use indicators, and a significant negative impact on the birth outcomes. The number of beds in healthcare institutions per 1,000 population and the number of rural telephone subscribers per 1,000 were found to be positively significantly related to the institutionalised delivery rate. The length of highway per square kilometre negatively influenced the maternal mortality ratio. The percentage of employed persons in the primary industry had a significant negative impact on the institutionalised delivery rate, and a significant positive impact on the infant mortality rate and neonatal mortality rate. The incremental costs of implementing the programme over the existing practice were US $11.1 million from the societal perspective, and US $13.8 million from the perspective of the Ministry of Health. Overall, 28,711 life years were generated by the programme, producing an overall incremental cost-effectiveness ratio of US $386 from the societal perspective, and US $480 from the perspective of the Ministry of Health, both of which were below the threshold willingness-to-pay ratio of US $675. The expected net monetary benefit generated by the programme was US $8.3 million from the societal perspective, and US $5.5 million from the perspective of the Ministry of Health. The overall probability that the programme was cost-effective was 0.93 and 0.89 from the two perspectives, respectively. The incremental cost-effectiveness ratio of the programme was insensitive to the different estimates of the three parameters relating to the model assumptions. Further research could be conducted to reduce the uncertainty surrounding the decision, in which the upper limit of investment was US $0.6 million from the societal perspective, and US $1.3 million from the perspective of the Ministry of Health. It is also worthwhile to get a more precise estimate of the improvement of infant mortality rate. The population expected value of perfect information for individual parameters associated with this parameter was US $0.99 million from the societal perspective, and US $1.14 million from the perspective of the Ministry of Health. The findings from this study have shown that the interventions in the Safe Motherhood Programme were both effective and cost-effective in increasing health service use and improving birth outcomes in rural areas of Guangxi, China. Therefore, the programme represents a good public health investment and should be adopted and further expanded to an even broader area if possible. This research provides economic evidence to inform efficient decision making in improving maternal and infant health in developing countries.

Robust designs for Poisson regression models

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the problem of how to construct robust designs for Poisson regression models. An analytical expression is derived for robust designs for first-order Poisson regression models where uncertainty exists in the prior parameter estimates. Given certain constraints in the methodology, it may be necessary to extend the robust designs for implementation in practical experiments. With these extensions, our methodology constructs designs which perform similarly, in terms of estimation, to current techniques, and offers the solution in a more timely manner. We further apply this analytic result to cases where uncertainty exists in the linear predictor. The application of this methodology to practical design problems such as screening experiments is explored. Given the minimal prior knowledge that is usually available when conducting such experiments, it is recommended to derive designs robust across a variety of systems. However, incorporating such uncertainty into the design process can be a computationally intense exercise. Hence, our analytic approach is explored as an alternative.

The association between playgroup participation, learning competence and social-emotional wellbeing for children aged 4-5 years in Australia

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data from Growing Up in Australia: The Longitudinal Study of Australian Children is used to examine the associations between playgroup participation and the outcomes for children aged 4 to 5 years. Controlling for a range of socio-economic and family characteristics, playgroup participation across the ages of 0-3 years was used to predict learning competence and social-emotional functioning outcomes at age 4-5 years. For learning competence, both boys and girls from disadvantaged families scored 3-4% higher if they attended playgroup when aged 0-1 and 2-3 years compared to boys and girls from disadvantaged families who did not attend playgroup. For social and emotional functioning, girls from disadvantaged families who attended playgroup when they were aged 0-1 and 2-3 years scored nearly 5% higher than those who did not attend. Demographic characteristics also showed that disadvantaged families were the families least likely to access these services. Despite data limitations, this study provides evidence that continued participation in playgroups is associated with better outcomes for children from disadvantaged families.

Large scale participatory acoustic sensor data analysis : tools and reputation models to enhance effectiveness

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustic sensors play an important role in augmenting the traditional biodiversity monitoring activities carried out by ecologists and conservation biologists. With this ability however comes the burden of analysing large volumes of complex acoustic data. Given the complexity of acoustic sensor data, fully automated analysis for a wide range of species is still a significant challenge. This research investigates the use of citizen scientists to analyse large volumes of environmental acoustic data in order to identify bird species. Specifically, it investigates ways in which the efficiency of a user can be improved through the use of species identification tools and the use of reputation models to predict the accuracy of users with unidentified skill levels. Initial experimental results are reported.

Leveraging Web 2.0 data for scalable semi-supervised learning of domain-specific sentiment lexicons

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Since manually constructing domain-specific sentiment lexicons is extremely time consuming and it may not even be feasible for domains where linguistic expertise is not available. Research on the automatic construction of domain-specific sentiment lexicons has become a hot topic in recent years. The main contribution of this paper is the illustration of a novel semi-supervised learning method which exploits both term-to-term and document-to-term relations hidden in a corpus for the construction of domain specific sentiment lexicons. More specifically, the proposed two-pass pseudo labeling method combines shallow linguistic parsing and corpusbase statistical learning to make domain-specific sentiment extraction scalable with respect to the sheer volume of opinionated documents archived on the Internet these days. Another novelty of the proposed method is that it can utilize the readily available user-contributed labels of opinionated documents (e.g., the user ratings of product reviews) to bootstrap the performance of sentiment lexicon construction. Our experiments show that the proposed method can generate high quality domain-specific sentiment lexicons as directly assessed by human experts. Moreover, the system generated domain-specific sentiment lexicons can improve polarity prediction tasks at the document level by 2:18% when compared to other well-known baseline methods. Our research opens the door to the development of practical and scalable methods for domain-specific sentiment analysis.

New variational Bayesian approaches for statistical data mining : with applications to profiling and differentiating habitual consumption behaviour of customers in the wireless telecommunication industry

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.

Building an Australian user community for Vivo : profiling research data for the Australian National Data Service

Relevância:

20.00% 20.00%

Publicador:

The role of idea novelty and relatedness in nascent ventures

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The study of venture idea characteristics and the contextual fit between venture ideas and individuals are key research goals in entrepreneurship (Davidsson, 2004). However, to date there has been limited scholarly attention given to these phenomena. Accordingly, this study aims to help fill the gap by investigating the importance of novelty and relatedness of venture ideas in entrepreneurial firms. On the premise that new venture creation is a process and that research should be focused on the early stages of the venturing process, this study primarily focuses its attention on examining how venture idea novelty and relatedness affect the performance in the venture creation process. Different types and degrees of novelty are considered here. Relatedness is shown to be based on individuals’ prior knowledge and resource endowment. Performance in the venture creation process is evaluated according to four possible outcomes: making progress, getting operational, being terminated and achieving positive cash flow. A theoretical model is developed demonstrating the relationship between these variables along with the investment of time and money. Several hypotheses are developed to be tested. Among them, it is hypothesised that novelty hinders short term performance in the venture creation process. On the other hand knowledge and resource relatedness are hypothesised to promote performance. An experimental study was required in order to understand how different types and degrees of novelty and relatedness of venture ideas affect the attractiveness of venture ideas in the eyes of experienced entrepreneurs. Thus, the empirical work in this thesis was based on two separate studies. In the first one, a conjoint analysis experiment was conducted on 32 experienced entrepreneurs in order to ascertain attitudinal preferences regarding venture idea attractiveness based on novelty, relatedness and potential financial gains. This helped to estimate utility values for different levels of different attributes of venture ideas and their relative importance in the attractiveness. The second study was a longitudinal investigation of how venture idea novelty and relatedness affect the performance in the venture creation process. The data for this study is from the Comprehensive Australian Study for Entrepreneurial Emergence (CAUSEE) project that has been established in order to explore the new venture creation process in Australia. CAUSEE collects data from a representative sample of over 30,000 households in Australia using random digit dialling (RDD) telephone interviews. From these cases, data was collected at two points in time during a 12 month period from 493 firms, who are currently involved in the start-up process. Hypotheses were tested and inferences were derived through descriptive statistics, confirmatory factor analysis and structural equation modelling. Results of study 1 indicate that venture idea characteristics have a role in the attractiveness and entrepreneurs prefer to introduce a moderate degree of novelty across all types of venture ideas concerned. Knowledge relatedness is demonstrated to be a more significant factor in attractiveness than resource relatedness. Results of study 2 show that the novelty hinders nascent venture performance. On the other hand, resource relatedness has a positive impact on performance unlike knowledge relatedness which has none. The results of these studies have important implications for potential entrepreneurs, investors, researchers, consultants etc. by developing a better understanding in the venture creation process and its success factors in terms of both theory and practice.

Sit versus stand : can sitting be accurately identified using MTI accelerometer data?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High levels of sitting have been linked with poor health outcomes. Previously a pragmatic MTI accelerometer data cut-point (100 count/min-1) has been used to estimate sitting. Data on the accuracy of this cut-point is unavailable. PURPOSE: To ascertain whether the 100 count/min-1 cut-point accurately isolates sitting from standing activities. METHODS: Participants fitted with an MTI accelerometer were observed performing a range of sitting, standing, light & moderate activities. 1-min epoch MTI data were matched to observed activities, then re-categorized as either sitting or not using the 100 count/min-1 cut-point. Self-report demographics and current physical activity were collected. Generalized estimating equation for repeated measures with a binary logistic model analyses (GEE), corrected for age, gender and BMI, were conducted to ascertain the odds of the MTI data being misclassified. RESULTS: Data were from 26 healthy subjects (8 men; 50% aged <25 years; mean BMI (SD) 22.7(3.8)m/kg2). MTI sitting and standing data mode was 0 count/min-1, with 46% of sitting activities and 21% of standing activities recording 0 count/min-1. The GEE was unable to accurately isolate sitting from standing activities using the 100 count/min-1 cut-point, since all sitting activities were incorrectly predicted as standing (p=0.05). To further explore the sensitivity of MTI data to delineate sitting from standing, the upper 95% confidence interval of the mean for the sitting activities (46 count/min-1) was used to re-categorise the data; this resulted in the GEE correctly classifying 49% of sitting, and 69% of standing activities. Using the 100 count/min-1 cut-point the data were re-categorised into a combined ‘sit/stand’ category and tested against other light activities: 88% of sit/stand and 87% of light activities were accurately predicted. Using Freedson’s moderate cut-point of 1952 count/min-1 the GEE accurately predicted 97% of light vs. 90% of moderate activities. CONCLUSION: The distributions of MTI recorded sitting and standing data overlap considerably, as such the 100 count/min -1 cut-point did not accurately isolate sitting from other static standing activities. The 100 count/min -1 cut-point more accurately predicted sit/stand vs. other movement orientated activities.

Arterial traffic congestion analysis using Bluetooth duration data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this study is to assess the potential use of Bluetooth data for traffic monitoring of arterial road networks. Bluetooth data provides the direct measurement of travel time between pairs of scanners, and intensive research has been reported on this topic. Bluetooth data includes “Duration” data, which represents the time spent by Bluetooth devices to pass through the detection range of Bluetooth scanners. If the scanners are located at signalised intersections, this Duration can be related to intersection performance, and hence represents valuable information for traffic monitoring. However the use of Duration has been ignored in previous analyses. In this study, the Duration data as well as travel time data is analysed to capture the traffic condition of a main arterial route in Brisbane. The data consists of one week of Bluetooth data provided by Brisbane City Council. As well, micro simulation analysis is conducted to further investigate the properties of Duration. The results reveal characteristics of Duration, and address future research needs to utilise this valuable data source.

A traffic simulation standard based on data marts

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traffic Simulation models tend to have their own data input and output formats. In an effort to standardise the input for traffic simulations, we introduce in this paper a set of data marts that aim to serve as a common interface between the necessaary data, stored in dedicated databases, and the swoftware packages, that require the input in a certain format. The data marts are developed based on real world objects (e.g. roads, traffic lights, controllers) rather than abstract models and hence contain all necessary information that can be transformed by the importing software package to their needs. The paper contains a full description of the data marts for network coding, simulation results, and scenario management, which have been discussed with industry partners to ensure sustainability.

Towards testing the eclectic paradigm on multinational contracting: an approach to reviewing and analysing secondary data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In response to the need to leverage private finance and the lack of competition in some parts of the Australian public sector major infrastructure market, especially in very large economic infrastructure procured using Pubic Private Partnerships, the Australian Federal government has demonstrated its desire to attract new sources of in-bound foreign direct investment (FDI) into the Australian construction market. This paper aims to report on progress towards an investigation into the determinants of multinational contractors’ willingness to bid for Australian public sector major infrastructure projects and which is designed to give an improved understanding of matters surrounding FDI into the Australian construction sector. This research deploys Dunning’s eclectic theory for the first time in terms of in-bound FDI by multinational contractors and as head contractors bidding for Australian major infrastructure public sector projects. Elsewhere, the authors have developed Dunning’s principal hypothesis associated with his eclectic framework in order to suit the context of this research and to address a weakness arising in Dunning’s principal hypothesis that is based on a nominal approach to the factors in the eclectic framework and which fail to speak to the relative explanatory power of these factors. In this paper, an approach to reviewing and analysing secondary data, as part of the first stage investigation in this research, is developed and some illustrations given, vis-à-vis the selected sector (roads, bridges and tunnels) in Australia (as the host location) and using one of the selected home countries (Spain). In conclusion, some tentative thoughts are offered in anticipation of the completion of the first stage investigation - in terms of the extent to which this first stage based on secondary data only might suggest the relative importance of the factors in the eclectic framework. It is noted that more robust conclusions are expected following the future planned stages of the research and these stages including primary data are briefly outlined. Finally, and beyond theoretical contributions expected from the overall approach taken to developing and testing Dunning’s framework, other expected contributions concerning research method and practical implications are mentioned.

Hormones down under: Hormone therapy use after the Women's Health Initiative

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aims: The primary objective was to describe the usage pattern of hormone therapy (HT) in a sample of urban Australian women in 2001 and to assess the characteristics of users vs. non-users. The second objective was to determine whether there had been any change in usage since the publication of the results of the combined oestrogen plus progestagen arm of the Women's Health Initiative (WHI) in 2002. Methods: A cohort of 374 postmenopausal women aged 50–80 years participated in this substudy of the LAW (Longitudinal Assessment of Ageing in Women) project: a 5-year multidisciplinary, observational study. Participants completed an annual medical assessment including details of the use of HT and the reasons for use, as well as demographic and psychosocial data. Results: In December 2001, 30.8% of the participants were using HT, whereas 55.4% were ever users. The management of vasomotor symptoms and mood disturbance were the primary reasons for use. Of those who had been using HT in December 2001 (24.4%) women ceased using HT in the 3 months following publication of the WHI results. The percentage of women using HT in December 2003 (13.9%) was less than half of that of December 2001. Conclusion: The rate of HT use and the reasons for use, in 2001 in Brisbane was similar to that of other Australian regions. Usage of HT decreased since the publication of the WHI results in 2002 which may reflect changing attitudes by patients and practitioners regarding HT.

Understanding the legal implications of data sharing, access and reuse in the Australian research landscape

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Researchers are increasingly involved in data-intensive research projects that cut across geographic and disciplinary borders. Quality research now often involves virtual communities of researchers participating in large-scale web-based collaborations, opening their earlystage research to the research community in order to encourage broader participation and accelerate discoveries. The result of such large-scale collaborations has been the production of ever-increasing amounts of data. In short, we are in the midst of a data deluge. Accompanying these developments has been a growing recognition that if the benefits of enhanced access to research are to be realised, it will be necessary to develop the systems and services that enable data to be managed and secured. It has also become apparent that to achieve seamless access to data it is necessary not only to adopt appropriate technical standards, practices and architecture, but also to develop legal frameworks that facilitate access to and use of research data. This chapter provides an overview of the current research landscape in Australia as it relates to the collection, management and sharing of research data. The chapter then explains the Australian legal regimes relevant to data, including copyright, patent, privacy, confidentiality and contract law. Finally, this chapter proposes the infrastructure elements that are required for the proper management of legal interests, ownership rights and rights to access and use data collected or generated by research projects.

«
1
2
...
48
49
50
51
52
53
54
55
56
»