364 resultados para Dataset


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent theoretical work has suggested “entrepreneurial capabilities” themselves may provide the resource foundations to deliver competitive advantage for entrepreneurial firms. This paper empirically examines how start-ups use such entrepreneurial capabilities to build competitive advantage. We investigate the effects of technological and marketing expertise, knowledge of market trends, flexibility and networking on the ability to obtain a cost leadership or differentiation advantage. Using a large dataset of 1,108 start-ups obtained after random sampling of over 30,193 households, we find that differentiation strategies benefit from most resource advantages. Cost leadership strategies, however, seem only to benefit from technological expertise and flexibility and not related to market-based advantages. By doing so, this study contributes to both entrepreneurship and RBV-theories by showing how entrepreneurial capabilities lead to competitive advantages in nascent and early-stage start-ups.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

To date, automatic recognition of semantic information such as salient objects and mid-level concepts from images is a challenging task. Since real-world objects tend to exist in a context within their environment, the computer vision researchers have increasingly incorporated contextual information for improving object recognition. In this paper, we present a method to build a visual contextual ontology from salient objects descriptions for image annotation. The ontologies include not only partOf/kindOf relations, but also spatial and co-occurrence relations. A two-step image annotation algorithm is also proposed based on ontology relations and probabilistic inference. Different from most of the existing work, we specially exploit how to combine representation of ontology, contextual knowledge and probabilistic inference. The experiments show that image annotation results are improved in the LabelMe dataset.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Aims: The Rural and Remote Road Safety Study (RRRSS) addresses a recognised need for greater research on road trauma in rural and remote Australia, the costs of which are disproportionately high compared with urban areas. The 5-year multi-phase study with whole-of-government support concluded in June 2008. Drawing on RRRSS data, we analysed fatal motorcycle crashes which occurred over 39 months to provide a description of crash characteristics, contributing factors and people involved. The descriptive analysis and discussion may inform development of tailored motorcycle safety interventions. Methods: RRRSS criteria sought vehicle crashes resulting in death or hospitalisation for 24 hours minimum of at least 1 person aged 16 years or over, in the study area defined roughly as the Queensland area north from Bowen in the east and Boulia in the west (excluding Townsville and Cairns urban areas). Fatal motorcycle crashes were selected from the RRRSS dataset. Analysis considered medical data covering injury types and severity, evidence of alcohol, drugs and prior medical conditions, as well as crash descriptions supplied by police to Queensland Transport on contributing circumstances, vehicle types, environmental conditions and people involved. Crash data were plotted in a geographic information system (MapInfo) for spatial analysis. Results: There were 23 deaths from 22 motorcycle crashes on public roads meeting RRRSS criteria. Of these, half were single vehicle crashes and half involved 2 or more vehicles. In contrast to general patterns for driver/rider age distribution in crashes, riders below 25 years of age were represented proportionally within the population. Riders in their thirties comprised 41% of fatalities, with a further 36% accounted for by riders in their fifties. 18 crashes occurred in the Far North Statistical Division (SD), with 2 crashes in both the Northern and North West SDs. Behavioural factors comprised the vast majority of contributing circumstances cited by police, with adverse environmental conditions noted in only 4 cases. Conclusions: Fatal motorcycle crashes were more likely to involve another vehicle and less likely to involve a young rider than non-fatal crashes recorded by the RRRSS. Rider behaviour contributed to the majority of crashes and should be a major focus of research, education and policy development, while other road users’ behaviour and awareness also remains important. With 68% of crashes occurring on major and secondary roads within a 130km radius of Cairns, efforts should focus on this geographic area.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Automatic detection of suspicious activities in CCTV camera feeds is crucial to the success of video surveillance systems. Such a capability can help transform the dumb CCTV cameras into smart surveillance tools for fighting crime and terror. Learning and classification of basic human actions is a precursor to detecting suspicious activities. Most of the current approaches rely on a non-realistic assumption that a complete dataset of normal human actions is available. This paper presents a different approach to deal with the problem of understanding human actions in video when no prior information is available. This is achieved by working with an incomplete dataset of basic actions which are continuously updated. Initially, all video segments are represented by Bags-Of-Words (BOW) method using only Term Frequency-Inverse Document Frequency (TF-IDF) features. Then, a data-stream clustering algorithm is applied for updating the system's knowledge from the incoming video feeds. Finally, all the actions are classified into different sets. Experiments and comparisons are conducted on the well known Weizmann and KTH datasets to show the efficacy of the proposed approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Phase-type distributions represent the time to absorption for a finite state Markov chain in continuous time, generalising the exponential distribution and providing a flexible and useful modelling tool. We present a new reversible jump Markov chain Monte Carlo scheme for performing a fully Bayesian analysis of the popular Coxian subclass of phase-type models; the convenient Coxian representation involves fewer parameters than a more general phase-type model. The key novelty of our approach is that we model covariate dependence in the mean whilst using the Coxian phase-type model as a very general residual distribution. Such incorporation of covariates into the model has not previously been attempted in the Bayesian literature. A further novelty is that we also propose a reversible jump scheme for investigating structural changes to the model brought about by the introduction of Erlang phases. Our approach addresses more questions of inference than previous Bayesian treatments of this model and is automatic in nature. We analyse an example dataset comprising lengths of hospital stays of a sample of patients collected from two Australian hospitals to produce a model for a patient's expected length of stay which incorporates the effects of several covariates. This leads to interesting conclusions about what contributes to length of hospital stay with implications for hospital planning. We compare our results with an alternative classical analysis of these data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Query reformulation is a key user behavior during Web search. Our research goal is to develop predictive models of query reformulation during Web searching. This article reports results from a study in which we automatically classified the query-reformulation patterns for 964,780 Web searching sessions, composed of 1,523,072 queries, to predict the next query reformulation. We employed an n-gram modeling approach to describe the probability of users transitioning from one query-reformulation state to another to predict their next state. We developed first-, second-, third-, and fourth-order models and evaluated each model for accuracy of prediction, coverage of the dataset, and complexity of the possible pattern set. The results show that Reformulation and Assistance account for approximately 45% of all query reformulations; furthermore, the results demonstrate that the first- and second-order models provide the best predictability, between 28 and 40% overall and higher than 70% for some patterns. Implications are that the n-gram approach can be used for improving searching systems and searching assistance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The wide range of contributing factors and circumstances surrounding crashes on road curves suggest that no single intervention can prevent these crashes. This paper presents a novel methodology, based on data mining techniques, to identify contributing factors and the relationship between them. It identifies contributing factors that influence the risk of a crash. Incident records, described using free text, from a large insurance company were analysed with rough set theory. Rough set theory was used to discover dependencies among data, and reasons using the vague, uncertain and imprecise information that characterised the insurance dataset. The results show that male drivers, who are between 50 and 59 years old, driving during evening peak hours are involved with a collision, had a lowest crash risk. Drivers between 25 and 29 years old, driving from around midnight to 6 am and in a new car has the highest risk. The analysis of the most significant contributing factors on curves suggests that drivers with driving experience of 25 to 42 years, who are driving a new vehicle have the highest crash cost risk, characterised by the vehicle running off the road and hitting a tree. This research complements existing statistically based tools approach to analyse road crashes. Our data mining approach is supported with proven theory and will allow road safety practitioners to effectively understand the dependencies between contributing factors and the crash type with the view to designing tailored countermeasures.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For most of the work done in developing association rule mining, the primary focus has been on the efficiency of the approach and to a lesser extent the quality of the derived rules has been emphasized. Often for a dataset, a huge number of rules can be derived, but many of them can be redundant to other rules and thus are useless in practice. The extremely large number of rules makes it difficult for the end users to comprehend and therefore effectively use the discovered rules and thus significantly reduces the effectiveness of rule mining algorithms. If the extracted knowledge can’t be effectively used in solving real world problems, the effort of extracting the knowledge is worth little. This is a serious problem but not yet solved satisfactorily. In this paper, we propose a concise representation called Reliable Approximate basis for representing non-redundant approximate association rules. We prove that the redundancy elimination based on the proposed basis does not reduce the belief to the extracted rules. We also prove that all approximate association rules can be deduced from the Reliable Approximate basis. Therefore the basis is a lossless representation of approximate association rules.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the size and state of the Internet today, a good quality approach to organizing this mass of information is of great importance. Clustering web pages into groups of similar documents is one approach, but relies heavily on good feature extraction and document representation as well as a good clustering approach and algorithm. Due to the changing nature of the Internet, resulting in a dynamic dataset, an incremental approach is preferred. In this work we propose an enhanced incremental clustering approach to develop a better clustering algorithm that can help to better organize the information available on the Internet in an incremental fashion. Experiments show that the enhanced algorithm outperforms the original histogram based algorithm by up to 7.5%.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Association rule mining is one technique that is widely used when querying databases, especially those that are transactional, in order to obtain useful associations or correlations among sets of items. Much work has been done focusing on efficiency, effectiveness and redundancy. There has also been a focusing on the quality of rules from single level datasets with many interestingness measures proposed. However, with multi-level datasets now being common there is a lack of interestingness measures developed for multi-level and cross-level rules. Single level measures do not take into account the hierarchy found in a multi-level dataset. This leaves the Support-Confidence approach,which does not consider the hierarchy anyway and has other drawbacks, as one of the few measures available. In this paper we propose two approaches which measure multi-level association rules to help evaluate their interestingness. These measures of diversity and peculiarity can be used to help identify those rules from multi-level datasets that are potentially useful.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Association rule mining has made many advances in the area of knowledge discovery. However, the quality of the discovered association rules is a big concern and has drawn more and more attention recently. One problem with the quality of the discovered association rules is the huge size of the extracted rule set. Often for a dataset, a huge number of rules can be extracted, but many of them can be redundant to other rules and thus useless in practice. Mining non-redundant rules is a promising approach to solve this problem. In this paper, we firstly propose a definition for redundancy; then we propose a concise representation called Reliable basis for representing non-redundant association rules for both exact rules and approximate rules. An important contribution of this paper is that we propose to use the certainty factor as the criteria to measure the strength of the discovered association rules. With the criteria, we can determine the boundary between redundancy and non-redundancy to ensure eliminating as many redundant rules as possible without reducing the inference capacity of and the belief to the remaining extracted non-redundant rules. We prove that the redundancy elimination based on the proposed Reliable basis does not reduce the belief to the extracted rules. We also prove that all association rules can be deduced from the Reliable basis. Therefore the Reliable basis is a lossless representation of association rules. Experimental results show that the proposed Reliable basis can significantly reduce the number of extracted rules.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Despite all attempts to prevent fraud, it continues to be a major threat to industry and government. Traditionally, organizations have focused on fraud prevention rather than detection, to combat fraud. In this paper we present a role mining inspired approach to represent user behaviour in Enterprise Resource Planning (ERP) systems, primarily aimed at detecting opportunities to commit fraud or potentially suspicious activities. We have adapted an approach which uses set theory to create transaction profiles based on analysis of user activity records. Based on these transaction profiles, we propose a set of (1) anomaly types to detect potentially suspicious user behaviour and (2) scenarios to identify inadequate segregation of duties in an ERP environment. In addition, we present two algorithms to construct a directed acyclic graph to represent relationships between transaction profiles. Experiments were conducted using a real dataset obtained from a teaching environment and a demonstration dataset, both using SAP R/3, presently the most predominant ERP system. The results of this empirical research demonstrate the effectiveness of the proposed approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The automatic extraction of road features from remote sensed images has been a topic of great interest within the photogrammetric and remote sensing communities for over 3 decades. Although various techniques have been reported in the literature, it is still challenging to efficiently extract the road details with the increasing of image resolution as well as the requirement for accurate and up-to-date road data. In this paper, we will focus on the automatic detection of road lane markings, which are crucial for many applications, including lane level navigation and lane departure warning. The approach consists of four steps: i) data preprocessing, ii) image segmentation and road surface detection, iii) road lane marking extraction based on the generated road surface, and iv) testing and system evaluation. The proposed approach utilized the unsupervised ISODATA image segmentation algorithm, which segments the image into vegetation regions, and road surface based only on the Cb component of YCbCr color space. A shadow detection method based on YCbCr color space is also employed to detect and recover the shadows from the road surface casted by the vehicles and trees. Finally, the lane marking features are detected from the road surface using the histogram clustering. The experiments of applying the proposed method to the aerial imagery dataset of Gympie, Queensland demonstrate the efficiency of the approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Suicide has drawn much attention from both the scientific community and the public. Examining the impact of socio-environmental factors on suicide is essential in developing suicide prevention strategies and interventions, because it will provide health authorities with important information for their decision-making. However, previous studies did not examine the impact of socio-environmental factors on suicide using a spatial analysis approach. The purpose of this study was to identify the patterns of suicide and to examine how socio-environmental factors impact on suicide over time and space at the Local Governmental Area (LGA) level in Queensland. The suicide data between 1999 and 2003 were collected from the Australian Bureau of Statistics (ABS). Socio-environmental variables at the LGA level included climate (rainfall, maximum and minimum temperature), Socioeconomic Indexes for Areas (SEIFA) and demographic variables (proportion of Indigenous population, unemployment rate, proportion of population with low income and low education level). Climate data were obtained from Australian Bureau of Meteorology. SEIFA and demographic variables were acquired from ABS. A series of statistical and geographical information system (GIS) approaches were applied in the analysis. This study included two stages. The first stage used average annual data to view the spatial pattern of suicide and to examine the association between socio-environmental factors and suicide over space. The second stage examined the spatiotemporal pattern of suicide and assessed the socio-environmental determinants of suicide, using more detailed seasonal data. In this research, 2,445 suicide cases were included, with 1,957 males (80.0%) and 488 females (20.0%). In the first stage, we examined the spatial pattern and the determinants of suicide using 5-year aggregated data. Spearman correlations were used to assess associations between variables. Then a Poisson regression model was applied in the multivariable analysis, as the occurrence of suicide is a small probability event and this model fitted the data quite well. Suicide mortality varied across LGAs and was associated with a range of socio-environmental factors. The multivariable analysis showed that maximum temperature was significantly and positively associated with male suicide (relative risk [RR] = 1.03, 95% CI: 1.00 to 1.07). Higher proportion of Indigenous population was accompanied with more suicide in male population (male: RR = 1.02, 95% CI: 1.01 to 1.03). There was a positive association between unemployment rate and suicide in both genders (male: RR = 1.04, 95% CI: 1.02 to 1.06; female: RR = 1.07, 95% CI: 1.00 to 1.16). No significant association was observed for rainfall, minimum temperature, SEIFA, proportion of population with low individual income and low educational attainment. In the second stage of this study, we undertook a preliminary spatiotemporal analysis of suicide using seasonal data. Firstly, we assessed the interrelations between variables. Secondly, a generalised estimating equations (GEE) model was used to examine the socio-environmental impact on suicide over time and space, as this model is well suited to analyze repeated longitudinal data (e.g., seasonal suicide mortality in a certain LGA) and it fitted the data better than other models (e.g., Poisson model). The suicide pattern varied with season and LGA. The north of Queensland had the highest suicide mortality rate in all the seasons, while there was no suicide case occurred in the southwest. Northwest had consistently higher suicide mortality in spring, autumn and winter. In other areas, suicide mortality varied between seasons. This analysis showed that maximum temperature was positively associated with suicide among male population (RR = 1.24, 95% CI: 1.04 to 1.47) and total population (RR = 1.15, 95% CI: 1.00 to 1.32). Higher proportion of Indigenous population was accompanied with more suicide among total population (RR = 1.16, 95% CI: 1.13 to 1.19) and by gender (male: RR = 1.07, 95% CI: 1.01 to 1.13; female: RR = 1.23, 95% CI: 1.03 to 1.48). Unemployment rate was positively associated with total (RR = 1.40, 95% CI: 1.24 to 1.59) and female (RR=1.09, 95% CI: 1.01 to 1.18) suicide. There was also a positive association between proportion of population with low individual income and suicide in total (RR = 1.28, 95% CI: 1.10 to 1.48) and male (RR = 1.45, 95% CI: 1.23 to 1.72) population. Rainfall was only positively associated with suicide in total population (RR = 1.11, 95% CI: 1.04 to 1.19). There was no significant association for rainfall, minimum temperature, SEIFA, proportion of population with low educational attainment. The second stage is the extension of the first stage. Different spatial scales of dataset were used between the two stages (i.e., mean yearly data in the first stage, and seasonal data in the second stage), but the results are generally consistent with each other. Compared with other studies, this research explored the variety of the impact of a wide range of socio-environmental factors on suicide in different geographical units. Maximum temperature, proportion of Indigenous population, unemployment rate and proportion of population with low individual income were among the major determinants of suicide in Queensland. However, the influence from other factors (e.g. socio-culture background, alcohol and drug use) influencing suicide cannot be ignored. An in-depth understanding of these factors is vital in planning and implementing suicide prevention strategies. Five recommendations for future research are derived from this study: (1) It is vital to acquire detailed personal information on each suicide case and relevant information among the population in assessing the key socio-environmental determinants of suicide; (2) Bayesian model could be applied to compare mortality rates and their socio-environmental determinants across LGAs in future research; (3) In the LGAs with warm weather, high proportion of Indigenous population and/or unemployment rate, concerted efforts need to be made to control and prevent suicide and other mental health problems; (4) The current surveillance, forecasting and early warning system needs to be strengthened, to trace the climate and socioeconomic change over time and space and its impact on population health; (5) It is necessary to evaluate and improve the facilities of mental health care, psychological consultation, suicide prevention and control programs; especially in the areas with low socio-economic status, high unemployment rate, extreme weather events and natural disasters.