932 resultados para data complexity
Resumo:
Texture analysis and textural cues have been applied for image classification, segmentation and pattern recognition. Dominant texture descriptors include directionality, coarseness, line-likeness etc. In this dissertation a class of textures known as particulate textures are defined, which are predominantly coarse or blob-like. The set of features that characterise particulate textures are different from those that characterise classical textures. These features are micro-texture, macro-texture, size, shape and compaction. Classical texture analysis techniques do not adequately capture particulate texture features. This gap is identified and new methods for analysing particulate textures are proposed. The levels of complexity in particulate textures are also presented ranging from the simplest images where blob-like particles are easily isolated from their back- ground to the more complex images where the particles and the background are not easily separable or the particles are occluded. Simple particulate images can be analysed for particle shapes and sizes. Complex particulate texture images, on the other hand, often permit only the estimation of particle dimensions. Real life applications of particulate textures are reviewed, including applications to sedimentology, granulometry and road surface texture analysis. A new framework for computation of particulate shape is proposed. A granulometric approach for particle size estimation based on edge detection is developed which can be adapted to the gray level of the images by varying its parameters. This study binds visual texture analysis and road surface macrotexture in a theoretical framework, thus making it possible to apply monocular imaging techniques to road surface texture analysis. Results from the application of the developed algorithm to road surface macro-texture, are compared with results based on Fourier spectra, the auto- correlation function and wavelet decomposition, indicating the superior performance of the proposed technique. The influence of image acquisition conditions such as illumination and camera angle on the results was systematically analysed. Experimental data was collected from over 5km of road in Brisbane and the estimated coarseness along the road was compared with laser profilometer measurements. Coefficient of determination R2 exceeding 0.9 was obtained when correlating the proposed imaging technique with the state of the art Sensor Measured Texture Depth (SMTD) obtained using laser profilometers.
Resumo:
Data preprocessing is widely recognized as an important stage in anomaly detection. This paper reviews the data preprocessing techniques used by anomaly-based network intrusion detection systems (NIDS), concentrating on which aspects of the network traffic are analyzed, and what feature construction and selection methods have been used. Motivation for the paper comes from the large impact data preprocessing has on the accuracy and capability of anomaly-based NIDS. The review finds that many NIDS limit their view of network traffic to the TCP/IP packet headers. Time-based statistics can be derived from these headers to detect network scans, network worm behavior, and denial of service attacks. A number of other NIDS perform deeper inspection of request packets to detect attacks against network services and network applications. More recent approaches analyze full service responses to detect attacks targeting clients. The review covers a wide range of NIDS, highlighting which classes of attack are detectable by each of these approaches. Data preprocessing is found to predominantly rely on expert domain knowledge for identifying the most relevant parts of network traffic and for constructing the initial candidate set of traffic features. On the other hand, automated methods have been widely used for feature extraction to reduce data dimensionality, and feature selection to find the most relevant subset of features from this candidate set. The review shows a trend toward deeper packet inspection to construct more relevant features through targeted content parsing. These context sensitive features are required to detect current attacks.
Resumo:
Women are substantially under-represented in the professoriate in Australia with a ratio of one female professor to every three male professors. This gender imbalance has been an ongoing concern with various affirmative action programs implemented in universities but to limited effect. Hence, there is a need to investigate the catalysts for and inhibitors to women’s ascent to the professoriate. This investigation focussed on women appointed to the professoriate between 2005, when a research quality assessment was first proposed, and 2008. Henceforth, these women are referred to as “New Women Professors”. The catalysts and inhibitors in these women’s careers were investigated through an electronic survey and focus group interviews. The survey was administered to new women professors (n=255) and new men professors (n=240) to enable a comparison of responses. However, only women participated in focus group discussions (n=21). An analysis of the survey and interview data revealed that the most critical catalysts for women’s advancement to the professoriate were equal employment opportunities and mentoring. Equal opportunity initiatives provided women with access to traditionally male-dominated forums. Mentoring gave women an insider perspective on the complexity of academia and the politics of the academy. The key inhibitors to women’s career advancement were negative discrimination, the culture of the boys’ club, the tension between personal and professional life, and isolation. Negative discrimination and the boys’ club are problematic because they favour men and marginalise women. The tension between personal and professional life is a particular concern for women who bear children and typically assume the major role in a family for child rearing. Isolation was a concern for both women and men with isolation appearing to increase after ascent to the professoriate. Knowledge of the significant catalysts and inhibitors provides a pragmatic way to orient universities towards redressing the gender balance in the professoriate.
Resumo:
This study aimed to explore resilience and wellbeing among a group of eight refugee women originating from several countries (mainly African) and living in Brisbane, most of whom were single mothers. To challenge mostly quantitative and gender-blind explorations of mental health concepts among refugee groups, the project sought an emic and contextual understanding of resilience and wellbeing. Established perspectives, while useful, tend to overlook the complexities of refugee mental health experiences and can neglect the dense nature of individual stories. The purpose of my study was to contest relatively simplistic narratives of mental health constructs that tend to dominate migrant and refugee studies and influence practice paradigms in the human services field. In this ethnographic exploration of mental health constructs conducted in 2008 and 2009, the use of in-depth interviews, participant observations, and visual ethnographic elements provided an opportunity for refugee women to tell their own stories. The participants’ unique narratives of pre- and post-migration experiences, shaped by specific gender, age, social, cultural and political aspects prevailing in their lives, yielded ‘thick’ ethnographic description (Geertz, 1973) of their social worlds. The findings explored in this study, namely language issues, the impact of community dynamics, and the single status of refugee women, clearly demonstrate that mental health constructs are fluid, multifaceted and complex in reality. In fact, language, community dynamics, and being a single mother, represented both opportunities and barriers in the lives of participants. In some contexts, these factors were conducive to resilience and wellbeing, while in other circumstances, these three elements acted as a hindrance to positive mental health outcomes. There are multiple dimensions to the findings, signifying that the social worlds of refugee women cannot be simplified using set definitions and neat notions of resilience and wellbeing. Instead, the intricacies and complexities embedded in the mundane of the everyday highlight novel conceptualisations of resilience and wellbeing. Based on the particular circumstances of single refugee mothers, whose experiences differ from that of married women, this thesis presents novel articulations of mental health constructs, as an alternative view to existing trends in the literature on refugee issues. Rich and multi-dimensional meanings associated with the socio-cultural determinants of mental health emerged in the process. This thesis’ findings highlight a significant gap in diasporic studies as well as simplistic assumptions about refugee women’s resettlement experiences. Single refugee women’s distinct issues are so complex and dense, that a contextual approach is critical to yield accurate depictions of their circumstances. It is therefore essential to understand refugee lived experiences within broader socio-political contexts to truly appreciate the depth of these narratives. In this manner, critical aspects salient to refugee journeys can inform different understandings of resilience, wellbeing and mental health, and shape contemporary policy and human service practice paradigms.
Resumo:
Since manually constructing domain-specific sentiment lexicons is extremely time consuming and it may not even be feasible for domains where linguistic expertise is not available. Research on the automatic construction of domain-specific sentiment lexicons has become a hot topic in recent years. The main contribution of this paper is the illustration of a novel semi-supervised learning method which exploits both term-to-term and document-to-term relations hidden in a corpus for the construction of domain specific sentiment lexicons. More specifically, the proposed two-pass pseudo labeling method combines shallow linguistic parsing and corpusbase statistical learning to make domain-specific sentiment extraction scalable with respect to the sheer volume of opinionated documents archived on the Internet these days. Another novelty of the proposed method is that it can utilize the readily available user-contributed labels of opinionated documents (e.g., the user ratings of product reviews) to bootstrap the performance of sentiment lexicon construction. Our experiments show that the proposed method can generate high quality domain-specific sentiment lexicons as directly assessed by human experts. Moreover, the system generated domain-specific sentiment lexicons can improve polarity prediction tasks at the document level by 2:18% when compared to other well-known baseline methods. Our research opens the door to the development of practical and scalable methods for domain-specific sentiment analysis.
Resumo:
This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.
Resumo:
Managing project-based learning is becoming an increasingly important part of project management. This article presents a comparative case study of 12 cases of knowledge transfer between temporary inter-organizational projects and permanent parent organizations. Our set-theoretic analysis of these data yields two major findings. First, a high level of absorptive capacity of the project owner is a necessary condition for successful project knowledge transfer, which implies that the responsibility for knowledge transfer seems to in the first place lie with the project parent organization, not with the project manager. Second, none of the factors are sufficient by themselves. This implies that successful project knowledge transfer is a complex process always involving configurations of multiple factors. We link these implications with the view of projects as complex temporary organizational forms in which successful project managers need to cope with complexity by simultaneously paying attention to both relational and organizational processes.
Resumo:
In many applications, e.g., bioinformatics, web access traces, system utilisation logs, etc., the data is naturally in the form of sequences. People have taken great interest in analysing the sequential data and finding the inherent characteristics or relationships within the data. Sequential association rule mining is one of the possible methods used to analyse this data. As conventional sequential association rule mining very often generates a huge number of association rules, of which many are redundant, it is desirable to find a solution to get rid of those unnecessary association rules. Because of the complexity and temporal ordered characteristics of sequential data, current research on sequential association rule mining is limited. Although several sequential association rule prediction models using either sequence constraints or temporal constraints have been proposed, none of them considered the redundancy problem in rule mining. The main contribution of this research is to propose a non-redundant association rule mining method based on closed frequent sequences and minimal sequential generators. We also give a definition for the non-redundant sequential rules, which are sequential rules with minimal antecedents but maximal consequents. A new algorithm called CSGM (closed sequential and generator mining) for generating closed sequences and minimal sequential generators is also introduced. A further experiment has been done to compare the performance of generating non-redundant sequential rules and full sequential rules, meanwhile, performance evaluation of our CSGM and other closed sequential pattern mining or generator mining algorithms has also been conducted. We also use generated non-redundant sequential rules for query expansion in order to improve recommendations for infrequently purchased products.
Resumo:
High levels of sitting have been linked with poor health outcomes. Previously a pragmatic MTI accelerometer data cut-point (100 count/min-1) has been used to estimate sitting. Data on the accuracy of this cut-point is unavailable. PURPOSE: To ascertain whether the 100 count/min-1 cut-point accurately isolates sitting from standing activities. METHODS: Participants fitted with an MTI accelerometer were observed performing a range of sitting, standing, light & moderate activities. 1-min epoch MTI data were matched to observed activities, then re-categorized as either sitting or not using the 100 count/min-1 cut-point. Self-report demographics and current physical activity were collected. Generalized estimating equation for repeated measures with a binary logistic model analyses (GEE), corrected for age, gender and BMI, were conducted to ascertain the odds of the MTI data being misclassified. RESULTS: Data were from 26 healthy subjects (8 men; 50% aged <25 years; mean BMI (SD) 22.7(3.8)m/kg2). MTI sitting and standing data mode was 0 count/min-1, with 46% of sitting activities and 21% of standing activities recording 0 count/min-1. The GEE was unable to accurately isolate sitting from standing activities using the 100 count/min-1 cut-point, since all sitting activities were incorrectly predicted as standing (p=0.05). To further explore the sensitivity of MTI data to delineate sitting from standing, the upper 95% confidence interval of the mean for the sitting activities (46 count/min-1) was used to re-categorise the data; this resulted in the GEE correctly classifying 49% of sitting, and 69% of standing activities. Using the 100 count/min-1 cut-point the data were re-categorised into a combined ‘sit/stand’ category and tested against other light activities: 88% of sit/stand and 87% of light activities were accurately predicted. Using Freedson’s moderate cut-point of 1952 count/min-1 the GEE accurately predicted 97% of light vs. 90% of moderate activities. CONCLUSION: The distributions of MTI recorded sitting and standing data overlap considerably, as such the 100 count/min -1 cut-point did not accurately isolate sitting from other static standing activities. The 100 count/min -1 cut-point more accurately predicted sit/stand vs. other movement orientated activities.
Resumo:
Dealing with product yield and quality in manufacturing industries is getting more difficult due to the increasing volume and complexity of data and quicker time to market expectations. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large databases. Growing self-organizing map (GSOM) is established as an efficient unsupervised datamining algorithm. In this study some modifications to the original GSOM are proposed for manufacturing yield improvement by clustering. These modifications include introduction of a clustering quality measure to evaluate the performance of the programme in separating good and faulty products and a filtering index to reduce noise from the dataset. Results show that the proposed method is able to effectively differentiate good and faulty products. It will help engineers construct the knowledge base to predict product quality automatically from collected data and provide insights for yield improvement.
Resumo:
Most learning paradigms impose a particular syntax on the class of concepts to be learned; the chosen syntax can dramatically affect whether the class is learnable or not. For classification paradigms, where the task is to determine whether the underlying world does or does not have a particular property, how that property is represented has no implication on the power of a classifier that just outputs 1’s or 0’s. But is it possible to give a canonical syntactic representation of the class of concepts that are classifiable according to the particular criteria of a given paradigm? We provide a positive answer to this question for classification in the limit paradigms in a logical setting, with ordinal mind change bounds as a measure of complexity. The syntactic characterization that emerges enables to derive that if a possibly noncomputable classifier can perform the task assigned to it by the paradigm, then a computable classifier can also perform the same task. The syntactic characterization is strongly related to the difference hierarchy over the class of open sets of some topological space; this space is naturally defined from the class of possible worlds and possible data of the learning paradigm.
Resumo:
The question of how to implement evidence effectively reveals a deficiency in our knowledge and understanding of the compound factors involved in such a process (Kitson, Rycroft-Malone et al. 2008). Although there is some awareness of the complexities of the process, there has been little exploration of the effectiveness of implementing evidence-based programs in health care. Despite public awareness of the dangers of smoking in pregnancy, and widespread public health measures to prevent smoking-related disease, women still continue to smoke in pregnancy (Ananth, Savitz et al. 1997; Laws and Hilder 2008). Evaluation of public health measures concludes that smoking cessation interventions during pregnancy increase quit rates among pregnant women (Melvin, Dolan-Mullen et al. 2000; Albrecht, Maloni et al. 2004; Lumley, Oliver et al. 2007). Notwithstanding the potential for improvement in health outcomes for pregnant women and their unborn babies, smoking interventions are often conducted poorly or not at all. Although midwives understand why women smoke in pregnancy and parenthood and are aware of the risks of smoking to both the pregnancy and the unborn child, they require specific knowledge and skills in the provision of support and advice on smoking for pregnant women (Bull and Whitehead 2006) . Organisational-change research demonstrates the complexity of the process of planned change in professionalised institutions such as health care (Greenhalgh, Robert et al. 2005). Some innovations and interventions are never accepted, and others are poorly supported (Greenhalgh, Robert et al. 2004). Comprehension of the change process around health promotion is crucial to the implementation of new health promotion interventions within health care (Riley, Taylor et al. 2003). This study utilised a case study approach to explore the process of implementing a smoking cessation training program for midwives in Queensland metropolitan and regional clinical areas, who attended a ‘Train-the-Trainer program’. The study draws on the organisational change work of Greenhalgh et al (2004) as the theoretical framework through which situational and structural factors are explored and examined as they inform the implementation of smoking cessation programs. The research data constituted staged interviews with midwives who instituted training programs for midwives, as well as organisational and policy documentation. Analysis of the data identified some areas that were not fully addressed in the theoretical model; these formed the basis of the Discussion and Implications for Future Research.
Resumo:
The aim of this study is to assess the potential use of Bluetooth data for traffic monitoring of arterial road networks. Bluetooth data provides the direct measurement of travel time between pairs of scanners, and intensive research has been reported on this topic. Bluetooth data includes “Duration” data, which represents the time spent by Bluetooth devices to pass through the detection range of Bluetooth scanners. If the scanners are located at signalised intersections, this Duration can be related to intersection performance, and hence represents valuable information for traffic monitoring. However the use of Duration has been ignored in previous analyses. In this study, the Duration data as well as travel time data is analysed to capture the traffic condition of a main arterial route in Brisbane. The data consists of one week of Bluetooth data provided by Brisbane City Council. As well, micro simulation analysis is conducted to further investigate the properties of Duration. The results reveal characteristics of Duration, and address future research needs to utilise this valuable data source.
Resumo:
Traffic Simulation models tend to have their own data input and output formats. In an effort to standardise the input for traffic simulations, we introduce in this paper a set of data marts that aim to serve as a common interface between the necessaary data, stored in dedicated databases, and the swoftware packages, that require the input in a certain format. The data marts are developed based on real world objects (e.g. roads, traffic lights, controllers) rather than abstract models and hence contain all necessary information that can be transformed by the importing software package to their needs. The paper contains a full description of the data marts for network coding, simulation results, and scenario management, which have been discussed with industry partners to ensure sustainability.