3 resultados para Categorical trait
em Indian Institute of Science - Bangalore - Índia
Resumo:
The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.
Resumo:
Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.
Resumo:
Primates constitute 25-40 % of the frugivore biomass of tropical forests. Primate fruit preference, as a determinant of seed dispersal, can therefore have a significant impact on these ecosystems. Although the traits of fruits included in primate diets have been described, fruit trait preference has been less studied with respect to fruit availability. We examined fruit trait preference and its implications for seed dispersal in the rhesus macaque (Macaca mulatta), a dietarily flexible species and important seed disperser, at the Buxa Tiger Reserve, India. Over a year, we monitored the phenology of selected trees in the study area, observed the feeding behavior of rhesus macaques using scans and focal animal sampling, and documented morphological traits of the fruits/seeds consumed. Using generalized linear modeling, we found that the kind of edible tissue was the chief determinant of fruit consumption, with M. mulatta feeding primarily on fruits with juicy-soft pulp and acting as seed predators for those with no discernible pulp. Overall, the preferred traits were external covers that could be easily pierced by a fingernail, medium to large seeds, true stone-like seeds, and juicy-soft edible tissue, thereby implying that fruit taxa with these traits had a higher probability of being dispersed. Macaques were more selective during the high fruit availability period than the low fruit availability period, preferentially feeding on soft-skinned fruits with juicy-soft pulp. We suggest that further studies be conducted across habitats and time to understand the consistency of interactions between primates and fruits with specific traits to determine the degree of selective pressure (if any) that is exerted by primates on fruit traits.