364 resultados para Dataset


Relevância:

10.00% 10.00%

Publicador:

Resumo:

A hierarchical structure is used to represent the content of the semi-structured documents such as XML and XHTML. The traditional Vector Space Model (VSM) is not sufficient to represent both the structure and the content of such web documents. Hence in this paper, we introduce a novel method of representing the XML documents in Tensor Space Model (TSM) and then utilize it for clustering. Empirical analysis shows that the proposed method is scalable for a real-life dataset as well as the factorized matrices produced from the proposed method helps to improve the quality of clusters due to the enriched document representation with both the structure and the content information.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Research is indicating that individuals who present for DUI treatment may have competing substance abuse and mental health needs. This study aimed to examine the extent of such comorbidity issues among a sample of Texas DUI offenders. Method: Records of 36,372 DUI clients and 308,695 non-DUI clients admitted to Texas treatment programs between 2005 and 2008 were obtained from the State's administrative dataset. The data were analysed to identify the relationship between substance use, psychiatric problems, program completion and recidivism rates. Results: Analysis indicated that while non-DUI clients were more likely to present with more severe illicit substance use problems, DUI clients were more likely to have a primary problem with alcohol. Additionally, a cannabis use problem was also found to be significantly associated with DUI recidivism in the last year. In regards to mental health needs, a major finding was that depression was the most common psychiatric condition reported by DUI clients, including those with more than one DUI offence in the past year. This group were also more at risk of being diagnosed with Bipolar Disorder compared to the general population, and such a diagnosis was also associated with an increased likelihood of not completing treatment. Interestingly, female DUI and non-DUI clients were also more likely to be diagnosed with mental health problems compared to males, as well as more likely to be placed on medications at admission and have problems with methamphetamine, cocaine, and opiates. Conclusion: The findings highlight the complex competing needs of some DUI offenders who enter treatment. The results also suggest that there is a need to utilise mental health and substance abuse screening methods to ensure DUI offenders are directed towards appropriate treatment pathways as well as ensure that such interventions adequately cater for complex substance abuse and psychiatric needs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Aims: Driving Under the Influence (DUI) enforcement can be a broad screening mechanism for alcohol and other drug problems. The current response to DUI is focused on using mechanical means to prevent inebriated persons from driving, with little attention the underlying substance abuse problems. ---------- Methods: This is a secondary analysis of an administrative dataset of over 345,000 individuals who entered Texas substance abuse treatment between 2005 and 2008. Of these, 36,372 were either on DUI probation, referred to treatment by probation, or had a DUI arrest in the past year. The DUI offenders were compared on demographic characteristics, substance use patterns, and levels of impairment with those who were not DUI offenders and first DUI offenders were compared with those with more than one past-year offense. T tests and chi square tests were used to determine significance. ---------- Results: DUI offenders were more likely to be employed, to have a problem with alcohol, to report more past-year arrests for any offense, to be older, and to have used alcohol and drugs longer than the non-DUI clients who reported higher ASI scores and were more likely to use daily. Those with one past-year DUI arrest were more likely to have problems with drugs other than alcohol and were less impaired than those with two or more arrests based on their ASI scores and daily use. Non-DUI clients reported higher levels of mood disorders than DUIs but there was no difference in their diagnosis of anxiety. Similar findings were found between those with one or multiple DUI arrests. ----------Conclusion: Although first-time DUIs were not as impaired as non-DUI clients, their levels of impairment were sufficient to cause treatment. Screening and brief intervention at arrest for all DUI offenders and treatment in combination with abstinence monitoring could decrease future recidivism.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Online dating networks, a type of social network, are gaining popularity. With many people joining and being available in the network, users are overwhelmed with choices when choosing their ideal partners. This problem can be overcome by utilizing recommendation methods. However, traditional recommendation methods are ineffective and inefficient for online dating networks where the dataset is sparse and/or large and two-way matching is required. We propose a methodology by using clustering, SimRank to recommend matching candidates to users in an online dating network. Data from a live online dating network is used in evaluation. The success rate of recommendation obtained using the proposed method is compared with baseline success rate of the network and the performance is improved by double.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Due to the change in attitudes and lifestyles, people expect to find new partners and friends via various ways now-a-days. Online dating networks create a network for people to meet each other and allow making contact with different objectives of developing a personal, romantic or sexual relationship. Due to the higher expectation of users, online matching companies are trying to adopt recommender systems. However, the existing recommendation techniques such as content-based, collaborative filtering or hybrid techniques focus on users explicit contact behaviors but ignore the implicit relationship among users in the network. This paper proposes a social matching system that uses past relations and user similarities in finding potential matches. The proposed system is evaluated on the dataset collected from an online dating network. Empirical analysis shows that the recommendation success rate has increased to 31% as compared to the baseline success rate of 19%.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

When performances are evaluated they are very often presented in a sequential order. Previous research suggests that the sequential presentation of alternatives may induce systematic biases in the way performances are evaluated. Such a phenomenon has been scarcely studied in economics. Using a large dataset of performance evaluation in the Idol series (N=1522), this paper presents new evidence about the systematic biases in sequential evaluation of performances and the psychological phenomena at the origin of these biases.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Scientists need to transfer semantically similar queries across multiple heterogeneous linked datasets. These queries may require data from different locations and the results are not simple to combine due to differences between datasets. A query model was developed to make it simple to distribute queries across different datasets using RDF as the result format. The query model, based on the concept of publicly recognised namespaces for parts of each scientific dataset, was implemented with a configuration that includes a large number of current biological and chemical datasets. The configuration is flexible, providing the ability to transparently use both private and public datasets in any query. A prototype implementation of the model was used to resolve queries for the Bio2RDF website, including both Bio2RDF datasets and other datasets that do not follow the Bio2RDF URI conventions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Personalised social matching systems can be seen as recommender systems that recommend people to others in the social networks. However, with the rapid growth of users in social networks and the information that a social matching system requires about the users, recommender system techniques have become insufficiently adept at matching users in social networks. This paper presents a hybrid social matching system that takes advantage of both collaborative and content-based concepts of recommendation. The clustering technique is used to reduce the number of users that the matching system needs to consider and to overcome other problems from which social matching systems suffer, such as cold start problem due to the absence of implicit information about a new user. The proposed system has been evaluated on a dataset obtained from an online dating website. Empirical analysis shows that accuracy of the matching process is increased, using both user information (explicit data) and user behavior (implicit data).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes the use of the Bayes Factor as a distance metric for speaker segmentation within a speaker diarization system. The proposed approach uses a pair of constant sized, sliding windows to compute the value of the Bayes Factor between the adjacent windows over the entire audio. Results obtained on the 2002 Rich Transcription Evaluation dataset show an improved segmentation performance compared to previous approaches reported in literature using the Generalized Likelihood Ratio. When applied in a speaker diarization system, this approach results in a 5.1% relative improvement in the overall Diarization Error Rate compared to the baseline.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The XML Document Mining track was launched for exploring two main ideas: (1) identifying key problems and new challenges of the emerging field of mining semi-structured documents, and (2) studying and assessing the potential of Machine Learning (ML) techniques for dealing with generic ML tasks in the structured domain, i.e., classification and clustering of semi-structured documents. This track has run for six editions during INEX 2005, 2006, 2007, 2008, 2009 and 2010. The first five editions have been summarized in previous editions and we focus here on the 2010 edition. INEX 2010 included two tasks in the XML Mining track: (1) unsupervised clustering task and (2) semi-supervised classification task where documents are organized in a graph. The clustering task requires the participants to group the documents into clusters without any knowledge of category labels using an unsupervised learning algorithm. On the other hand, the classification task requires the participants to label the documents in the dataset into known categories using a supervised learning algorithm and a training set. This report gives the details of clustering and classification tasks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Association rule mining has contributed to many advances in the area of knowledge discovery. However, the quality of the discovered association rules is a big concern and has drawn more and more attention recently. One problem with the quality of the discovered association rules is the huge size of the extracted rule set. Often for a dataset, a huge number of rules can be extracted, but many of them can be redundant to other rules and thus useless in practice. Mining non-redundant rules is a promising approach to solve this problem. In this paper, we first propose a definition for redundancy, then propose a concise representation, called a Reliable basis, for representing non-redundant association rules. The Reliable basis contains a set of non-redundant rules which are derived using frequent closed itemsets and their generators instead of using frequent itemsets that are usually used by traditional association rule mining approaches. An important contribution of this paper is that we propose to use the certainty factor as the criterion to measure the strength of the discovered association rules. Using this criterion, we can ensure the elimination of as many redundant rules as possible without reducing the inference capacity of the remaining extracted non-redundant rules. We prove that the redundancy elimination, based on the proposed Reliable basis, does not reduce the strength of belief in the extracted rules. We also prove that all association rules, their supports and confidences, can be retrieved from the Reliable basis without accessing the dataset. Therefore the Reliable basis is a lossless representation of association rules. Experimental results show that the proposed Reliable basis can significantly reduce the number of extracted rules. We also conduct experiments on the application of association rules to the area of product recommendation. The experimental results show that the non-redundant association rules extracted using the proposed method retain the same inference capacity as the entire rule set. This result indicates that using non-redundant rules only is sufficient to solve real problems needless using the entire rule set.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Video surveillance technology, based on Closed Circuit Television (CCTV) cameras, is one of the fastest growing markets in the field of security technologies. However, the existing video surveillance systems are still not at a stage where they can be used for crime prevention. The systems rely heavily on human observers and are therefore limited by factors such as fatigue and monitoring capabilities over long periods of time. To overcome this limitation, it is necessary to have “intelligent” processes which are able to highlight the salient data and filter out normal conditions that do not pose a threat to security. In order to create such intelligent systems, an understanding of human behaviour, specifically, suspicious behaviour is required. One of the challenges in achieving this is that human behaviour can only be understood correctly in the context in which it appears. Although context has been exploited in the general computer vision domain, it has not been widely used in the automatic suspicious behaviour detection domain. So, it is essential that context has to be formulated, stored and used by the system in order to understand human behaviour. Finally, since surveillance systems could be modeled as largescale data stream systems, it is difficult to have a complete knowledge base. In this case, the systems need to not only continuously update their knowledge but also be able to retrieve the extracted information which is related to the given context. To address these issues, a context-based approach for detecting suspicious behaviour is proposed. In this approach, contextual information is exploited in order to make a better detection. The proposed approach utilises a data stream clustering algorithm in order to discover the behaviour classes and their frequency of occurrences from the incoming behaviour instances. Contextual information is then used in addition to the above information to detect suspicious behaviour. The proposed approach is able to detect observed, unobserved and contextual suspicious behaviour. Two case studies using video feeds taken from CAVIAR dataset and Z-block building, Queensland University of Technology are presented in order to test the proposed approach. From these experiments, it is shown that by using information about context, the proposed system is able to make a more accurate detection, especially those behaviours which are only suspicious in some contexts while being normal in the others. Moreover, this information give critical feedback to the system designers to refine the system. Finally, the proposed modified Clustream algorithm enables the system to both continuously update the system’s knowledge and to effectively retrieve the information learned in a given context. The outcomes from this research are: (a) A context-based framework for automatic detecting suspicious behaviour which can be used by an intelligent video surveillance in making decisions; (b) A modified Clustream data stream clustering algorithm which continuously updates the system knowledge and is able to retrieve contextually related information effectively; and (c) An update-describe approach which extends the capability of the existing human local motion features called interest points based features to the data stream environment.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Automated visual surveillance of crowds is a rapidly growing area of research. In this paper we focus on motion representation for the purpose of abnormality detection in crowded scenes. We propose a novel visual representation called textures of optical flow. The proposed representation measures the uniformity of a flow field in order to detect anomalous objects such as bicycles, vehicles and skateboarders; and can be combined with spatial information to detect other forms of abnormality. We demonstrate that the proposed approach outperforms state-of-the-art anomaly detection algorithms on a large, publicly-available dataset.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent research has begun to address and even compare nascent entrepreneurship and nascent corporate entrepreneurship. An opportunity based view holds great potential to integrate both streams of research, but also presents challenges in how we define corporate entrepreneurship. We extend (corporate) entrepreneurship literature to the opportunity identification phase by providing a framework to classify different types of corporate entrepreneurship. Through analysis of a large dataset on nascent (corporate) entrepreneurship (PSEDII) we show that these corporate entrepreneurs differ largely from each other in terms of human capital. Prior studies have indicated that independent and corporate entrepreneurs pursue different types of opportunities and utilize different strategies. Our findings from the opportunity identification phase challenge those differences and seem to indicate a difference between the opportunities corporate entrepreneurs identify versus the opportunities they exploit.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In today’s electronic world vast amounts of knowledge is stored within many datasets and databases. Often the default format of this data means that the knowledge within is not immediately accessible, but rather has to be mined and extracted. This requires automated tools and they need to be effective and efficient. Association rule mining is one approach to obtaining knowledge stored with datasets / databases which includes frequent patterns and association rules between the items / attributes of a dataset with varying levels of strength. However, this is also association rule mining’s downside; the number of rules that can be found is usually very big. In order to effectively use the association rules (and the knowledge within) the number of rules needs to be kept manageable, thus it is necessary to have a method to reduce the number of association rules. However, we do not want to lose knowledge through this process. Thus the idea of non-redundant association rule mining was born. A second issue with association rule mining is determining which ones are interesting. The standard approach has been to use support and confidence. But they have their limitations. Approaches which use information about the dataset’s structure to measure association rules are limited, but could yield useful association rules if tapped. Finally, while it is important to be able to get interesting association rules from a dataset in a manageable size, it is equally as important to be able to apply them in a practical way, where the knowledge they contain can be taken advantage of. Association rules show items / attributes that appear together frequently. Recommendation systems also look at patterns and items / attributes that occur together frequently in order to make a recommendation to a person. It should therefore be possible to bring the two together. In this thesis we look at these three issues and propose approaches to help. For discovering non-redundant rules we propose enhanced approaches to rule mining in multi-level datasets that will allow hierarchically redundant association rules to be identified and removed, without information loss. When it comes to discovering interesting association rules based on the dataset’s structure we propose three measures for use in multi-level datasets. Lastly, we propose and demonstrate an approach that allows for association rules to be practically and effectively used in a recommender system, while at the same time improving the recommender system’s performance. This especially becomes evident when looking at the user cold-start problem for a recommender system. In fact our proposal helps to solve this serious problem facing recommender systems.