912 resultados para hierarchical clustering techniques
Resumo:
In this paper we describe the approaches adopted to generate the runs submitted to ImageCLEFPhoto 2009 with an aim to promote document diversity in the rankings. Four of our runs are text based approaches that employ textual statistics extracted from the captions of images, i.e. MMR [1] as a state of the art method for result diversification, two approaches that combine relevance information and clustering techniques, and an instantiation of Quantum Probability Ranking Principle. The fifth run exploits visual features of the provided images to re-rank the initial results by means of Factor Analysis. The results reveal that our methods based on only text captions consistently improve the performance of the respective baselines, while the approach that combines visual features with textual statistics shows lower levels of improvements.
Resumo:
Travel speed is one of the most critical parameters for road safety; the evidence suggests that increased vehicle speed is associated with higher crash risk and injury severity. Both naturalistic and simulator studies have reported that drivers distracted by a mobile phone select a lower driving speed. Speed decrements have been argued to be a risk compensatory behaviour of distracted drivers. Nonetheless, the extent and circumstances of the speed change among distracted drivers are still not known very well. As such, the primary objective of this study was to investigate patterns of speed variation in relation to contextual factors and distraction. Using the CARRS-Q high-fidelity Advanced Driving Simulator, the speed selection behaviour of 32 drivers aged 18-26 years was examined in two phone conditions: baseline (no phone conversation) and handheld phone operation. The simulator driving route contained five different types of road traffic complexities, including one road section with a horizontal S curve, one horizontal S curve with adjacent traffic, one straight segment of suburban road without traffic, one straight segment of suburban road with traffic interactions, and one road segment in a city environment. Speed deviations from the posted speed limit were analysed using Ward’s Hierarchical Clustering method to identify the effects of road traffic environment and cognitive distraction. The speed deviations along curved road sections formed two different clusters for the two phone conditions, implying that distracted drivers adopt a different strategy for selecting driving speed in a complex driving situation. In particular, distracted drivers selected a lower speed while driving along a horizontal curve. The speed deviation along the city road segment and other straight road segments grouped into a different cluster, and the deviations were not significantly different across phone conditions, suggesting a negligible effect of distraction on speed selection along these road sections. Future research should focus on developing a risk compensation model to explain the relationship between road traffic complexity and distraction.
Resumo:
Objectives In China, “serious road traffic crashes” (SRTCs) are those in which there are 10-30 fatalities, 50-100 serious injuries or a total cost of 50-100 million RMB ($US8-16m), and “particularly serious road traffic crashes” (PSRTCs) are those which are more severe or costly. Due to the large number of fatalities and injuries as well as the negative public reaction they elicit, SRTCs and PSRTCs have become great concerns to China during recent years. The aim of this study is to identify the main factors contributing to these road traffic crashes and to propose preventive measures to reduce their number. Methods 49 contributing factors of the SRTCs and PSRTCs that occurred from 2007 to 2013 were collected from the database “In-depth Investigation and Analysis System for Major Road traffic crashes” (IIASMRTC) and were analyzed through the integrated use of principal component analysis and hierarchical clustering to determine the primary and secondary groups of contributing factors. Results Speeding and overloading of passengers were the primary contributing factors, featuring in up to 66.3% and 32.6% of accidents respectively. Two secondary contributing factors were road-related: lack of or nonstandard roadside safety infrastructure, and slippery roads due to rain, snow or ice. Conclusions The current approach to SRTCs and PSRTCs is focused on the attribution of responsibility and the enforcement of regulations considered relevant to particular SRTCs and PSRTCs. It would be more effective to investigate contributing factors and characteristics of SRTCs and PSRTCs as a whole, to provide adequate information for safety interventions in regions where SRTCs and PSRTCs are more common. In addition to mandating of a driver training program and publicisation of the hazards associated with traffic violations, implementation of speed cameras, speed signs, markings and vehicle-mounted GPS are suggested to reduce speeding of passenger vehicles, while increasing regular checks by traffic police and passenger station staff, and improving transportation management to increase income of contractors and drivers are feasible measures to prevent overloading of people. Other promising measures include regular inspection of roadside safety infrastructure, and improving skid resistance on dangerous road sections in mountainous areas.
Resumo:
Core Vector Machine(CVM) is suitable for efficient large-scale pattern classification. In this paper, a method for improving the performance of CVM with Gaussian kernel function irrespective of the orderings of patterns belonging to different classes within the data set is proposed. This method employs a selective sampling based training of CVM using a novel kernel based scalable hierarchical clustering algorithm. Empirical studies made on synthetic and real world data sets show that the proposed strategy performs well on large data sets.
Resumo:
Three classification techniques, namely, K-means Cluster Analysis (KCA), Fuzzy Cluster Analysis (FCA), and Kohonen Neural Networks (KNN) were employed to group 25 microwatersheds of Kherthal watershed, Rajasthan into homogeneous groups for formulating the basis for suitable conservation and management practices. Ten parameters, mainly, morphological, namely, drainage density (D-d), bifurcation ratio (R-b), stream frequency (F-u), length of overland flow (L-o), form factor (R-f), shape factor (B-s), elongation ratio (R-e), circulatory ratio (R-c), compactness coefficient (C-c) and texture ratio (T) are used for the classification. Optimal number of groups is chosen, based on two cluster validation indices Davies-Bouldin and Dunn's. Comparative analysis of various clustering techniques revealed that 13 microwatersheds out of 25 are commonly suggested by KCA, FCA and KNN i.e., 52%; 17 microwatersheds out of 25 i.e., 68% are commonly suggested by KCA and FCA whereas these are 16 out of 25 in FCA and KNN (64%) and 15 out of 25 in KNN and CA (60%). It is observed from KNN sensitivity analysis that effect of various number of epochs (1000, 3000, 5000) and learning rates (0.01, 0.1-0.9) on total squared error values is significant even though no fixed trend is observed. Sensitivity analysis studies revealed that microwatershecls have occupied all the groups even though their number in each group is different in case of further increase in the number of groups from 5 to 6, 7 and 8. (C) 2010 International Association of Hydro-environment Engineering and Research, Asia Pacific Division. Published by Elsevier B.V. All rights reserved.
Resumo:
Establishing functional relationships between multi-domain protein sequences is a non-trivial task. Traditionally, delineating functional assignment and relationships of proteins requires domain assignments as a prerequisite. This process is sensitive to alignment quality and domain definitions. In multi-domain proteins due to multiple reasons, the quality of alignments is poor. We report the correspondence between the classification of proteins represented as full-length gene products and their functions. Our approach differs fundamentally from traditional methods in not performing the classification at the level of domains. Our method is based on an alignment free local matching scores (LMS) computation at the amino-acid sequence level followed by hierarchical clustering. As there are no gold standards for full-length protein sequence classification, we resorted to Gene Ontology and domain-architecture based similarity measures to assess our classification. The final clusters obtained using LMS show high functional and domain architectural similarities. Comparison of the current method with alignment based approaches at both domain and full-length protein showed superiority of the LMS scores. Using this method we have recreated objective relationships among different protein kinase sub-families and also classified immunoglobulin containing proteins where sub-family definitions do not exist currently. This method can be applied to any set of protein sequences and hence will be instrumental in analysis of large numbers of full-length protein sequences.
Resumo:
We investigated the site response characteristics of Kachchh rift basin over the meizoseismal area of the 2001, Mw 7.6, Bhuj (NW India) earthquake using the spectral ratio of the horizontal and vertical components of ambient vibrations. Using the available knowledge on the regional geology of Kachchh and well documented ground responses from the earthquake, we evaluated the H/V curves pattern across sediment filled valleys and uplifted areas generally characterized by weathered sandstones. Although our HIV curves showed a largely fuzzy nature, we found that the hierarchical clustering method was useful for comparing large numbers of response curves and identifying the areas with similar responses. Broad and plateau shaped peaks of a cluster of curves within the valley region suggests the possibility of basin effects within valley. Fundamental resonance frequencies (f(0)) are found in the narrow range of 0.1-2.3 Hz and their spatial distribution demarcated the uplifted regions from the valleys. In contrary, low HIV peak amplitudes (A(0) = 2-4) were observed on the uplifted areas and varying values (2-9) were found within valleys. Compared to the amplification factors, the liquefaction indices (kg) were able to effectively indicate the areas which experienced severe liquefaction. The amplification ranges obtained in the current study were found to be comparable to those obtained from earthquake data for a limited number of seismic stations located on uplifted areas; however the values on the valley region may not reflect their true amplification potential due to basin effects. Our study highlights the practical usefulness as well as limitations of the HIV method to study complex geological settings as Kachchh. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
This study analyzed species richness, distribution, and sighting frequency of selected reef fishes to describe species assemblage composition, abundance, and spatial distribution patterns among sites and regions (Upper Keys, Middle Keys, Lower Keys, and Dry Tortugas) within the Florida Keys National Marine Sanctuary (FKNMS) barrier reef ecosystem. Data were obtained from the Reef Environmental Education Foundation (REEF) Fish Survey Project, a volunteer fish-monitoring program. A total of 4,324 visual fish surveys conducted at 112 sites throughout the FKNMS were used in these analyses. The data set contained sighting information on 341 fish species comprising 68 families. Species richness was generally highest in the Upper Keys sites (maximum was 220 species at Molasses Reef) and lowest in the Dry Tortugas sites. Encounter rates differed among regions, with the Dry Tortugas having the highest rate, potentially a result of differences in the evenness in fishes and the lower diversity of habitat types in the Dry Tortugas region. Geographic coverage maps were developed for 29 frequently observed species. Fourteen of these species showed significant regional variation in mean sighting frequency (%SF). Six species had significantly lower mean %SF and eight species had significantly higher mean %SF in the Dry Tortugas compared with other regions. Hierarchical clustering based on species composition (presence-absence) and species % SF revealed interesting patterns of similarities among sites that varied across spatial scales. Results presented here indicate that phenomena affecting reef fish composition in the FKNMS operate at multiple spatial scales, including a biogeographic scale that defines the character of the region as a whole, a reef scale (~50-100 km) that include meso-scale physical oceanographic processes and regional variation in reef structure and associated reef habitats, and a local scale that includes level of protection, cross-shelf location and a suite of physical characteristics of a given reef. It is likely that at both regional and local scales, species habitat requirements strongly influence the patterns revealed in this study, and are particularly limiting for species that are less frequently observed in the Dry Tortugas. The results of this report serve as a benchmark for the current status of the reef fishes in the FKNMS. In addition, these data provide the basis for analyses on reserve effects and the biogeographic coupling of benthic habitats and fish assemblages that are currently underway. (PDF contains 61 pages.)
Resumo:
Elucidating the intricate relationship between brain structure and function, both in healthy and pathological conditions, is a key challenge for modern neuroscience. Recent progress in neuroimaging has helped advance our understanding of this important issue, with diffusion images providing information about structural connectivity (SC) and functional magnetic resonance imaging shedding light on resting state functional connectivity (rsFC). Here, we adopt a systems approach, relying on modular hierarchical clustering, to study together SC and rsFC datasets gathered independently from healthy human subjects. Our novel approach allows us to find a common skeleton shared by structure and function from which a new, optimal, brain partition can be extracted. We describe the emerging common structure-function modules (SFMs) in detail and compare them with commonly employed anatomical or functional parcellations. Our results underline the strong correspondence between brain structure and resting-state dynamics as well as the emerging coherent organization of the human brain.
Resumo:
Esta dissertação apresenta resultados da aplicação de filtros adaptativos, utilizando os algoritmos NLMS (Normalized Least Mean Square) e RLS (Recursive Least Square), para a redução de desvios em previsões climáticas. As discrepâncias existentes entre o estado real da atmosfera e o previsto por um modelo numérico tendem a aumentar ao longo do período de integração. O modelo atmosférico Eta é utilizado operacionalmente para previsão numérica no CPTEC/INPE e como outros modelos atmosféricos, apresenta imprecisão nas previsões climáticas. Existem pesquisas que visam introduzir melhorias no modelo atmosférico Eta e outras que avaliam as previsões e identificam os erros do modelo para que seus produtos sejam utilizados de forma adequada. Dessa forma, neste trabalho pretende-se filtrar os dados provenientes do modelo Eta e ajustá-los, de modo a minimizar os erros entre os resultados fornecidos pelo modelo Eta e as reanálises do NCEP. Assim, empregamos técnicas de processamento digital de sinais e imagens com o intuito de reduzir os erros das previsões climáticas do modelo Eta. Os filtros adaptativos nesta dissertação ajustarão as séries ao longo do tempo de previsão. Para treinar os filtros foram utilizadas técnicas de agrupamento de regiões, como por exemplo o algoritmo de clusterização k-means, de modo a selecionar séries climáticas que apresentem comportamentos semelhantes entre si. As variáveis climáticas estudadas são o vento meridional e a altura geopotencial na região coberta pelo modelo de previsão atmosférica Eta com resolução de 40 km, a um nível de pressão de 250 hPa. Por fim, os resultados obtidos mostram que o filtro com 4 coeficientes, adaptado pelo algoritmo RLS em conjunto com o critério de seleção de regiões por meio do algoritmo k-means apresenta o melhor desempenho ao reduzir o erro médio e a dispersão do erro, tanto para a variável vento meridional quanto para a variável altura geopotencial.
Resumo:
We introduce the Pitman Yor Diffusion Tree (PYDT) for hierarchical clustering, a generalization of the Dirichlet Diffusion Tree (Neal, 2001) which removes the restriction to binary branching structure. The generative process is described and shown to result in an exchangeable distribution over data points. We prove some theoretical properties of the model and then present two inference methods: a collapsed MCMC sampler which allows us to model uncertainty over tree structures, and a computationally efficient greedy Bayesian EM search algorithm. Both algorithms use message passing on the tree structure. The utility of the model and algorithms is demonstrated on synthetic and real world data, both continuous and binary.
Resumo:
The capability to automatically identify shapes, objects and materials from the image content through direct and indirect methodologies has enabled the development of several civil engineering related applications that assist in the design, construction and maintenance of construction projects. This capability is a product of the technological breakthroughs in the area of Image Processing that has allowed for the development of a large number of digital imaging applications in all industries. In this paper, an automated and content based shape recognition model is presented. This model was devised to enhance the recognition capabilities of our existing material based image retrieval model. The shape recognition model is based on clustering techniques, and specifically those related with material and object segmentation. The model detects the borders of each previously detected material depicted in the image, examines its linearity (length/width ratio) and detects its orientation (horizontal/vertical). The results emonstrate the suitability of this model for construction site image retrieval purposes and reveal the capability of existing clustering technologies to accurately identify the shape of a wealth of materials from construction site images.
Resumo:
Multivariate classification methods were used to evaluate data on the concentrations of eight metals in human senile lenses measured by atomic absorption spectrometry. Principal components analysis and hierarchical clustering separated senile cataract lenses, nuclei from cataract lenses, and normal lenses into three classes on the basis of the eight elements. Stepwise discriminant analysis was applied to give discriminant functions with five selected variables. Results provided by the linear learning machine method were also satisfactory; the k-nearest neighbour method was less useful.
Resumo:
Small failures should only disrupt a small part of a network. One way to do this is by marking the surrounding area as untrustworthy --- circumscribing the failure. This can be done with a distributed algorithm using hierarchical clustering and neighbor relations, and the resulting circumscription is near-optimal for convex failures.
Resumo:
Q. Meng and M.H. Lee, 'Error-driven active learning in growing radial basis function networks for early robot learning', 2006 IEEE International Conference on Robotics and Automation (IEEE ICRA 2006), 2984-90, Orlando, Florida, USA.