Biblioteca Digital

846 resultados para resistant data

Leveraging Web 2.0 data for scalable semi-supervised learning of domain-specific sentiment lexicons

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Since manually constructing domain-specific sentiment lexicons is extremely time consuming and it may not even be feasible for domains where linguistic expertise is not available. Research on the automatic construction of domain-specific sentiment lexicons has become a hot topic in recent years. The main contribution of this paper is the illustration of a novel semi-supervised learning method which exploits both term-to-term and document-to-term relations hidden in a corpus for the construction of domain specific sentiment lexicons. More specifically, the proposed two-pass pseudo labeling method combines shallow linguistic parsing and corpusbase statistical learning to make domain-specific sentiment extraction scalable with respect to the sheer volume of opinionated documents archived on the Internet these days. Another novelty of the proposed method is that it can utilize the readily available user-contributed labels of opinionated documents (e.g., the user ratings of product reviews) to bootstrap the performance of sentiment lexicon construction. Our experiments show that the proposed method can generate high quality domain-specific sentiment lexicons as directly assessed by human experts. Moreover, the system generated domain-specific sentiment lexicons can improve polarity prediction tasks at the document level by 2:18% when compared to other well-known baseline methods. Our research opens the door to the development of practical and scalable methods for domain-specific sentiment analysis.

New variational Bayesian approaches for statistical data mining : with applications to profiling and differentiating habitual consumption behaviour of customers in the wireless telecommunication industry

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.

Building an Australian user community for Vivo : profiling research data for the Australian National Data Service

Relevância:

20.00% 20.00%

Publicador:

The role of soil characteristics on temperature sensitivity of soil organic matter

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The uncertainty associated with how projected climate change will affect global C cycling could have a large impact on predictions of soil C stocks. The purpose of our study was to determine how various soil decomposition and chemistry characteristics relate to soil organic matter (SOM) temperature sensitivity. We accomplished this objective using long-term soil incubations at three temperatures (15, 25, and 35°C) and pyrolysis molecular beam mass spectrometry (py-MBMS) on 12 soils from 6 sites along a mean annual temperature (MAT) gradient (2–25.6°C). The Q10 values calculated from the CO2 respired during a long-term incubation using the Q10-q method showed decomposition of the more resistant fraction to be more temperature sensitive with a Q10-q of 1.95 ± 0.08 for the labile fraction and a Q10-q of 3.33 ± 0.04 for the more resistant fraction. We compared the fit of soil respiration data using a two-pool model (active and slow) with first-order kinetics with a three-pool model and found that the two and three-pool models statistically fit the data equally well. The three-pool model changed the size and rate constant for the more resistant pool. The size of the active pool in these soils, calculated using the two-pool model, increased with incubation temperature and ranged from 0.1 to 14.0% of initial soil organic C. Sites with an intermediate MAT and lowest C/N ratio had the largest active pool. Pyrolysis molecular beam mass spectrometry showed declines in carbohydrates with conversion from grassland to wheat cultivation and a greater amount of protected carbohydrates in allophanic soils which may have lead to differences found between the total amount of CO2 respired, the size of the active pool, and the Q10-q values of the soils.

Sit versus stand : can sitting be accurately identified using MTI accelerometer data?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High levels of sitting have been linked with poor health outcomes. Previously a pragmatic MTI accelerometer data cut-point (100 count/min-1) has been used to estimate sitting. Data on the accuracy of this cut-point is unavailable. PURPOSE: To ascertain whether the 100 count/min-1 cut-point accurately isolates sitting from standing activities. METHODS: Participants fitted with an MTI accelerometer were observed performing a range of sitting, standing, light & moderate activities. 1-min epoch MTI data were matched to observed activities, then re-categorized as either sitting or not using the 100 count/min-1 cut-point. Self-report demographics and current physical activity were collected. Generalized estimating equation for repeated measures with a binary logistic model analyses (GEE), corrected for age, gender and BMI, were conducted to ascertain the odds of the MTI data being misclassified. RESULTS: Data were from 26 healthy subjects (8 men; 50% aged <25 years; mean BMI (SD) 22.7(3.8)m/kg2). MTI sitting and standing data mode was 0 count/min-1, with 46% of sitting activities and 21% of standing activities recording 0 count/min-1. The GEE was unable to accurately isolate sitting from standing activities using the 100 count/min-1 cut-point, since all sitting activities were incorrectly predicted as standing (p=0.05). To further explore the sensitivity of MTI data to delineate sitting from standing, the upper 95% confidence interval of the mean for the sitting activities (46 count/min-1) was used to re-categorise the data; this resulted in the GEE correctly classifying 49% of sitting, and 69% of standing activities. Using the 100 count/min-1 cut-point the data were re-categorised into a combined ‘sit/stand’ category and tested against other light activities: 88% of sit/stand and 87% of light activities were accurately predicted. Using Freedson’s moderate cut-point of 1952 count/min-1 the GEE accurately predicted 97% of light vs. 90% of moderate activities. CONCLUSION: The distributions of MTI recorded sitting and standing data overlap considerably, as such the 100 count/min -1 cut-point did not accurately isolate sitting from other static standing activities. The 100 count/min -1 cut-point more accurately predicted sit/stand vs. other movement orientated activities.

Arterial traffic congestion analysis using Bluetooth duration data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this study is to assess the potential use of Bluetooth data for traffic monitoring of arterial road networks. Bluetooth data provides the direct measurement of travel time between pairs of scanners, and intensive research has been reported on this topic. Bluetooth data includes “Duration” data, which represents the time spent by Bluetooth devices to pass through the detection range of Bluetooth scanners. If the scanners are located at signalised intersections, this Duration can be related to intersection performance, and hence represents valuable information for traffic monitoring. However the use of Duration has been ignored in previous analyses. In this study, the Duration data as well as travel time data is analysed to capture the traffic condition of a main arterial route in Brisbane. The data consists of one week of Bluetooth data provided by Brisbane City Council. As well, micro simulation analysis is conducted to further investigate the properties of Duration. The results reveal characteristics of Duration, and address future research needs to utilise this valuable data source.

A traffic simulation standard based on data marts

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traffic Simulation models tend to have their own data input and output formats. In an effort to standardise the input for traffic simulations, we introduce in this paper a set of data marts that aim to serve as a common interface between the necessaary data, stored in dedicated databases, and the swoftware packages, that require the input in a certain format. The data marts are developed based on real world objects (e.g. roads, traffic lights, controllers) rather than abstract models and hence contain all necessary information that can be transformed by the importing software package to their needs. The paper contains a full description of the data marts for network coding, simulation results, and scenario management, which have been discussed with industry partners to ensure sustainability.

Screening, isolation, and decolonisation strategies in the control of meticillin resistant Staphylococcus aureus in intensive care units : cost effectiveness evaluation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective: To assess the cost-effectiveness of screening, isolation and decolonisation strategies in the control of methicillin-resistant Staphylococcus aureus (MRSA) in intensive care units (ICUs). Design: Economic evaluation. Setting: England and Wales. Population: ICU patients. Main outcome measures: Infections, deaths, costs, quality adjusted life years (QALYs), incremental cost-effectiveness ratios for alternative strategies, net monetary benefits (NMBs). Results: All strategies using isolation but not decolonisation improved health outcomes but increased costs. When MRSA prevalence on admission to the ICU was 5% and the willingness to pay per QALY gained was between £20,000 and £30,000, the best such strategy was to isolate only those patients at high risk of carrying MRSA (either pre-emptively or following identification by admission and weekly MRSA screening using chromogenic agar). Universal admission and weekly screening using polymerase chain reaction (PCR)-based MRSA detection coupled with isolation was unlikely to be cost-effective unless prevalence was high (10% colonised with MRSA on admission to the ICU). All decolonisation strategies improved health outcomes and reduced costs. While universal decolonisation (regardless of MRSA status) was the most cost-effective in the short-term, strategies using screening to target MRSA carriers may be preferred due to reduced risk of selecting for resistance. Amongst such targeted strategies, universal admission and weekly PCR screening coupled with decolonisation with nasal mupirocin was the most cost-effective. This finding was robust to ICU size, MRSA admission prevalence, the proportion of patients classified as high-risk, and the precise value of willingness to pay for health benefits. Conclusions: MRSA control strategies that use decolonisation are likely to be cost-saving in an ICU setting provided resistance is lacking, and combining universal PCR-based screening with decolonisation is likely to represent good value for money if untargeted decolonisation is considered unacceptable. In ICUs where decolonisation is not implemented there is insufficient evidence to support universal MRSA screening outside high prevalence settings.

Towards testing the eclectic paradigm on multinational contracting: an approach to reviewing and analysing secondary data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In response to the need to leverage private finance and the lack of competition in some parts of the Australian public sector major infrastructure market, especially in very large economic infrastructure procured using Pubic Private Partnerships, the Australian Federal government has demonstrated its desire to attract new sources of in-bound foreign direct investment (FDI) into the Australian construction market. This paper aims to report on progress towards an investigation into the determinants of multinational contractors’ willingness to bid for Australian public sector major infrastructure projects and which is designed to give an improved understanding of matters surrounding FDI into the Australian construction sector. This research deploys Dunning’s eclectic theory for the first time in terms of in-bound FDI by multinational contractors and as head contractors bidding for Australian major infrastructure public sector projects. Elsewhere, the authors have developed Dunning’s principal hypothesis associated with his eclectic framework in order to suit the context of this research and to address a weakness arising in Dunning’s principal hypothesis that is based on a nominal approach to the factors in the eclectic framework and which fail to speak to the relative explanatory power of these factors. In this paper, an approach to reviewing and analysing secondary data, as part of the first stage investigation in this research, is developed and some illustrations given, vis-à-vis the selected sector (roads, bridges and tunnels) in Australia (as the host location) and using one of the selected home countries (Spain). In conclusion, some tentative thoughts are offered in anticipation of the completion of the first stage investigation - in terms of the extent to which this first stage based on secondary data only might suggest the relative importance of the factors in the eclectic framework. It is noted that more robust conclusions are expected following the future planned stages of the research and these stages including primary data are briefly outlined. Finally, and beyond theoretical contributions expected from the overall approach taken to developing and testing Dunning’s framework, other expected contributions concerning research method and practical implications are mentioned.

Understanding the legal implications of data sharing, access and reuse in the Australian research landscape

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Researchers are increasingly involved in data-intensive research projects that cut across geographic and disciplinary borders. Quality research now often involves virtual communities of researchers participating in large-scale web-based collaborations, opening their earlystage research to the research community in order to encourage broader participation and accelerate discoveries. The result of such large-scale collaborations has been the production of ever-increasing amounts of data. In short, we are in the midst of a data deluge. Accompanying these developments has been a growing recognition that if the benefits of enhanced access to research are to be realised, it will be necessary to develop the systems and services that enable data to be managed and secured. It has also become apparent that to achieve seamless access to data it is necessary not only to adopt appropriate technical standards, practices and architecture, but also to develop legal frameworks that facilitate access to and use of research data. This chapter provides an overview of the current research landscape in Australia as it relates to the collection, management and sharing of research data. The chapter then explains the Australian legal regimes relevant to data, including copyright, patent, privacy, confidentiality and contract law. Finally, this chapter proposes the infrastructure elements that are required for the proper management of legal interests, ownership rights and rights to access and use data collected or generated by research projects.

Feasibility of using health data sources to inform product safety surveillance in Queensland : a report for the Queensland Injury Prevention Council

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report provides an evaluation of the current available evidence-base for identification and surveillance of product-related injuries in children in Queensland. While the focal population was children in Queensland, the identification of information needs and data sources for product safety surveillance has applicability nationally for all age groups. The report firstly summarises the data needs of product safety regulators regarding product-related injury in children, describing the current sources of information informing product safety policy and practice, and documenting the priority product surveillance areas affecting children which have been a focus over recent years in Queensland. Health data sources in Queensland which have the potential to inform product safety surveillance initiatives were evaluated in terms of their ability to address the information needs of product safety regulators. Patterns in product-related injuries in children were analysed using routinely available health data to identify areas for future intervention, and the patterns in product-related injuries in children identified in health data were compared to those identified by product safety regulators. Recommendations were made for information system improvements and improved access to and utilisation of health data for more proactive approaches to product safety surveillance in the future.

Hunters & gatherers : strategies for curriculum mapping and data collection for assurance of learning

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Assurance of learning is a predominant feature in both quality enhancement and assurance in higher education. Assurance of learning is a process that articulates explicit program outcomes and standards, and systematically gathers evidence to determine the extent to which performance matches expectations. Benefits accrue to the institution through the systematic assessment of whole of program goals. Data may be used for continuous improvement, program development, and to inform external accreditation and evaluation bodies. Recent developments, including the introduction of the Tertiary Education and Quality Standards Agency (TEQSA) will require universities to review the methods they use to assure learning outcomes. This project investigates two critical elements of assurance of learning: 1. the mapping of graduate attributes throughout a program; and 2. the collection of assurance of learning data. An audit was conducted with 25 of the 39 Business Schools in Australian universities to identify current methods of mapping graduate attributes and for collecting assurance of learning data across degree programs, as well as a review of the key challenges faced in these areas. Our findings indicate that external drivers like professional body accreditation (for example: Association to Advance Collegiate Schools of Business (AACSB)) and TEQSA are important motivators for assuring learning, and those who were undertaking AACSB accreditation had more robust assurance of learning systems in place. It was reassuring to see that the majority of institutions (96%) had adopted an embedding approach to assuring learning rather than opting for independent standardised testing. The main challenges that were evident were the development of sustainable processes that were not considered a burden to academic staff, and obtainment of academic buy in to the benefits of assuring learning per se rather than assurance of learning being seen as a tick box exercise. This cultural change is the real challenge in assurance of learning practice.

Properties of lignin and poly(hydroxybutyrate) blends

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Queensland University of Technology (QUT) allows the presentation of a thesis for the Degree of Doctor of Philosophy in the format of published or submitted papers, where such papers have been published, accepted or submitted during the period of candidature. This thesis is composed of Seven published/submitted papers and one poster presentation, of which five have been published and the other two are under review. This project is financially supported by the QUTPRA Grant. The twenty-first century started with the resurrection of lignocellulosic biomass as a potential substitute for petrochemicals. Petrochemicals, which enjoyed the sustainable economic growth during the past century, have begun to reach or have reached their peak. The world energy situation is complicated by political uncertainty and by the environmental impact associated with petrochemical import and usage. In particular, greenhouse gasses and toxic emissions produced by petrochemicals have been implicated as a significant cause of climate changes. Lignocellulosic biomass (e.g. sugarcane biomass and bagasse), which potentially enjoys a more abundant, widely distributed, and cost-effective resource base, can play an indispensible role in the paradigm transition from fossil-based to carbohydrate-based economy. Poly(3-hydroxybutyrate), PHB has attracted much commercial interest as a plastic and biodegradable material because some its physical properties are similar to those of polypropylene (PP), even though the two polymers have quite different chemical structures. PHB exhibits a high degree of crystallinity, has a high melting point of approximately 180°C, and most importantly, unlike PP, PHB is rapidly biodegradable. Two major factors which currently inhibit the widespread use of PHB are its high cost and poor mechanical properties. The production costs of PHB are significantly higher than for plastics produced from petrochemical resources (e.g. PP costs $US1 kg-1, whereas PHB costs $US8 kg-1), and its stiff and brittle nature makes processing difficult and impedes its ability to handle high impact. Lignin, together with cellulose and hemicellulose, are the three main components of every lignocellulosic biomass. It is a natural polymer occurring in the plant cell wall. Lignin, after cellulose, is the most abundant polymer in nature. It is extracted mainly as a by-product in the pulp and paper industry. Although, traditionally lignin is burnt in industry for energy, it has a lot of value-add properties. Lignin, which to date has not been exploited, is an amorphous polymer with hydrophobic behaviour. These make it a good candidate for blending with PHB and technically, blending can be a viable solution for price and reduction and enhance production properties. Theoretically, lignin and PHB affect the physiochemical properties of each other when they become miscible in a composite. A comprehensive study on structural, thermal, rheological and environmental properties of lignin/PHB blends together with neat lignin and PHB is the targeted scope of this thesis. An introduction to this research, including a description of the research problem, a literature review and an account of the research progress linking the research papers is presented in Chapter 1. In this research, lignin was obtained from bagasse through extraction with sodium hydroxide. A novel two-step pH precipitation procedure was used to recover soda lignin with the purity of 96.3 wt% from the black liquor (i.e. the spent sodium hydroxide solution). The precipitation process is presented in Chapter 2. A sequential solvent extraction process was used to fractionate the soda lignin into three fractions. These fractions, together with the soda lignin, were characterised to determine elemental composition, purity, carbohydrate content, molecular weight, and functional group content. The thermal properties of the lignins were also determined. The results are presented and discussed in Chapter 2. On the basis of the type and quantity of functional groups, attempts were made to identify potential applications for each of the individual lignins. As an addendum to the general section on the development of composite materials of lignin, which includes Chapters 1 and 2, studies on the kinetics of bagasse thermal degradation are presented in Appendix 1. The work showed that distinct stages of mass losses depend on residual sucrose. As the development of value-added products from lignin will improve the economics of cellulosic ethanol, a review on lignin applications, which included lignin/PHB composites, is presented in Appendix 2. Chapters 3, 4 and 5 are dedicated to investigations of the properties of soda lignin/PHB composites. Chapter 3 reports on the thermal stability and miscibility of the blends. Although the addition of soda lignin shifts the onset of PHB decomposition to lower temperatures, the lignin/PHB blends are thermally more stable over a wider temperature range. The results from the thermal study also indicated that blends containing up to 40 wt% soda lignin were miscible. The Tg data for these blends fitted nicely to the Gordon-Taylor and Kwei models. Fourier transform infrared spectroscopy (FT-IR) evaluation showed that the miscibility of the blends was because of specific hydrogen bonding (and similar interactions) between reactive phenolic hydroxyl groups of lignin and the carbonyl group of PHB. The thermophysical and rheological properties of soda lignin/PHB blends are presented in Chapter 4. In this chapter, the kinetics of thermal degradation of the blends is studied using thermogravimetric analysis (TGA). This preliminary investigation is limited to the processing temperature of blend manufacturing. Of significance in the study, is the drop in the apparent energy of activation, Ea from 112 kJmol-1 for pure PHB to half that value for blends. This means that the addition of lignin to PHB reduces the thermal stability of PHB, and that the comparative reduced weight loss observed in the TGA data is associated with the slower rate of lignin degradation in the composite. The Tg of PHB, as well as its melting temperature, melting enthalpy, crystallinity and melting point decrease with increase in lignin content. Results from the rheological investigation showed that at low lignin content (.30 wt%), lignin acts as a plasticiser for PHB, while at high lignin content it acts as a filler. Chapter 5 is dedicated to the environmental study of soda lignin/PHB blends. The biodegradability of lignin/PHB blends is compared to that of PHB using the standard soil burial test. To obtain acceptable biodegradation data, samples were buried for 12 months under controlled conditions. Gravimetric analysis, TGA, optical microscopy, scanning electron microscopy (SEM), differential scanning calorimetry (DSC), FT-IR, and X-ray photoelectron spectroscopy (XPS) were used in the study. The results clearly demonstrated that lignin retards the biodegradation of PHB, and that the miscible blends were more resistant to degradation compared to the immiscible blends. To obtain an understanding between the structure of lignin and the properties of the blends, a methanol-soluble lignin, which contains 3× less phenolic hydroxyl group that its parent soda lignin used in preparing blends for the work reported in Chapters 3 and 4, was blended with PHB and the properties of the blends investigated. The results are reported in Chapter 6. At up to 40 wt% methanolsoluble lignin, the experimental data fitted the Gordon-Taylor and Kwei models, similar to the results obtained soda lignin-based blends. However, the values obtained for the interactive parameters for the methanol-soluble lignin blends were slightly lower than the blends obtained with soda lignin indicating weaker association between methanol-soluble lignin and PHB. FT-IR data confirmed that hydrogen bonding is the main interactive force between the reactive functional groups of lignin and the carbonyl group of PHB. In summary, the structural differences existing between the two lignins did not manifest itself in the properties of their blends.

Data modelling in the beginning school years

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper argues for a renewed focus on statistical reasoning in the beginning school years, with opportunities for children to engage in data modelling. Some of the core components of data modelling are addressed. A selection of results from the first data modelling activity implemented during the second year (2010; second grade) of a current longitudinal study are reported. Data modelling involves investigations of meaningful phenomena, deciding what is worthy of attention (identifying complex attributes), and then progressing to organising, structuring, visualising, and representing data. Reported here are children's abilities to identify diverse and complex attributes, sort and classify data in different ways, and create and interpret models to represent their data.

Data flow analysis of embedded program expressions

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data flow analysis techniques can be used to help assess threats to data confidentiality and integrity in security critical program code. However, a fundamental weakness of static analysis techniques is that they overestimate the ways in which data may propagate at run time. Discounting large numbers of these false-positive data flow paths wastes an information security evaluator's time and effort. Here we show how to automatically eliminate some false-positive data flow paths by precisely modelling how classified data is blocked by certain expressions in embedded C code. We present a library of detailed data flow models of individual expression elements and an algorithm for introducing these components into conventional data flow graphs. The resulting models can be used to accurately trace byte-level or even bit-level data flow through expressions that are normally treated as atomic. This allows us to identify expressions that safely downgrade their classified inputs and thereby eliminate false-positive data flow paths from the security evaluation process. To validate the approach we have implemented and tested it in an existing data flow analysis toolkit.

«
1
2
...
17
18
19
20
21
22
23
...
56
57
»