987 resultados para cosmologia, clustering, AP-test
Resumo:
Clustering is an important technique in organising and categorising web scale documents. The main challenges faced in clustering the billions of documents available on the web are the processing power required and the sheer size of the datasets available. More importantly, it is nigh impossible to generate the labels for a general web document collection containing billions of documents and a vast taxonomy of topics. However, document clusters are most commonly evaluated by comparison to a ground truth set of labels for documents. This paper presents a clustering and labeling solution where the Wikipedia is clustered and hundreds of millions of web documents in ClueWeb12 are mapped on to those clusters. This solution is based on the assumption that the Wikipedia contains such a wide range of diverse topics that it represents a small scale web. We found that it was possible to perform the web scale document clustering and labeling process on one desktop computer under a couple of days for the Wikipedia clustering solution containing about 1000 clusters. It takes longer to execute a solution with finer granularity clusters such as 10,000 or 50,000. These results were evaluated using a set of external data.
Resumo:
It is important that we understand the factors and conditions that shape driver behaviour – those conditions within the road transport system that contribute to driver error and the situations where driver non-compliance to road regulations is likely. This report presents the findings derived from a program of research investigating the nature of errors made by drivers, involving a literature review and an on-road study. The review indicates that, despite significant investigation, the role of different error types in road traffic crashes remains unclear, as does the role of the wider road transport system failures in driver error causation.
Resumo:
Seat belts are one of the most effective passive safety features in vehicles and there is a host of research literature attesting to the effectiveness of seat belts in protecting against death and injury. Even when use rates are high the potential gains in trauma reduction from further improvements in wearing rates are substantial. However, those currently most resistant to restraint use have also proven most difficult to target using conventional countermeasures. It is necessary to address the issues of non-wearing in order to achieve further gains in seat belt wearing. This study provide evidence-based recommendations for the way forward to tackle the problems of adult restraint non-use in light passenger vehicles in the short, medium and longer term in Australia. While there are substantial issues to be addressed for these groups, these are outside the scope of this study.
Resumo:
Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods which have been recently employed to analyse PNSD data, however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K-means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and Silhouette width validation values and the K-means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K-means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectra to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.
Resumo:
Automotive interactive technologies represent an exemplar challenge for user experience (UX) designers, as the concerns for aesthetics, functionality and usability add up to the compelling issues of safety and cognitive demand. This extended abstract presents a methodology for the user-centred creation and evaluation of novel in-car applications, involving real users in realistic use settings. As a case study, we present the methodologies of an ideation workshop in a simulated environment and the evaluation of six design idea prototypes for in-vehicle head up display (HUD) applications using a semi-naturalistic drive. Both methods rely on video recordings of real traffic situations that the users are familiar with and/or experienced themselves. The extended abstract presents experiences and results from the evaluation and reflection on our methods.
Resumo:
Many researchers in the field of civil structural health monitoring (SHM) have developed and tested their methods on simple to moderately complex laboratory structures such as beams, plates, frames, and trusses. Fieldwork has also been conducted by many researchers and practitioners on more complex operating bridges. Most laboratory structures do not adequately replicate the complexity of truss bridges. Informed by a brief review of the literature, this paper documents the design and proposed test plan of a structurally complex laboratory bridge model that has been specifically designed for the purpose of SHM research. Preliminary results have been presented in the companion paper.
Resumo:
The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.
Resumo:
Background The capacity to diagnosys, quantify and evaluate movement beyond the general confines of a clinical environment under effectiveness conditions may alleviate rampant strain on limited, expensive and highly specialized medical resources. An iPhone 4® mounted a three dimensional accelerometer subsystem with highly robust software applications. The present study aimed to evaluate the reliability and concurrent criterion-related validity of the accelerations with an iPhone 4® in an Extended Timed Get Up and Go test. Extended Timed Get Up and Go is a clinical test with that the patient get up from the chair and walking ten meters, turn and coming back to the chair. Methods A repeated measure, cross-sectional, analytical study. Test-retest reliability of the kinematic measurements of the iPhone 4® compared with a standard validated laboratory device. We calculated the Coefficient of Multiple Correlation between the two sensors acceleration signal of each subject, in each sub-stage, in each of the three Extended Timed Get Up and Go test trials. To investigate statistical agreement between the two sensors we used the Bland-Altman method. Results With respect to the analysis of the correlation data in the present work, the Coefficient of Multiple Correlation of the five subjects in their triplicated trials were as follows: in sub-phase Sit to Stand the ranged between r = 0.991 to 0.842; in Gait Go, r = 0.967 to 0.852; in Turn, 0.979 to 0.798; in Gait Come, 0.964 to 0.887; and in Turn to Stand to Sit, 0.992 to 0.877. All the correlations between the sensors were significant (p < 0.001). The Bland-Altman plots obtained showed a solid tendency to stay at close to zero, especially on the y and x-axes, during the five phases of the Extended Timed Get Up and Go test. Conclusions The inertial sensor mounted in the iPhone 4® is sufficiently reliable and accurate to evaluate and identify the kinematic patterns in an Extended Timed Get and Go test. While analysis and interpretation of 3D kinematics data continue to be dauntingly complex, the iPhone 4® makes the task of acquiring the data relatively inexpensive and easy to use.
Resumo:
Background Balance dysfunction is one of the most common problems in people who suffer stroke. To parameterize functional tests standardized by inertial sensors have been promoted in applied medicine. The aim of this study was to compare the kinematic variables of the Functional Reach Test (FRT) obtained by two inertial sensors placed on the trunk and lumbar region between stroke survivors (SS) and healthy older adults (HOA) and to analyze the reliability of the kinematic measurements obtained. Methods Cross-sectional study. Five SS and five HOA over 65. A descriptive analysis of the average range as well as all kinematic variables recorded was developed. The intrasubject and intersubject reliability of the measured variables was directly calculated. Results In the same intervals, the angular displacement was greater in the HOA group; however, they were completed at similar times for both groups, and HOA conducted the test at a higher speed and greater acceleration in each of the intervals. The SS values were higher than HOA values in the maximum and minimum acceleration in the trunk and in the lumbar region. Conclusions The SS show less functional reach, a narrower, slower and less accelerated movement during the FRT execution, but with higher peaks of acceleration and speed when they are compared with HOA.
Resumo:
We propose a novel technique for conducting robust voice activity detection (VAD) in high-noise recordings. We use Gaussian mixture modeling (GMM) to train two generic models; speech and non-speech. We then score smaller segments of a given (unseen) recording against each of these GMMs to obtain two respective likelihood scores for each segment. These scores are used to compute a dissimilarity measure between pairs of segments and to carry out complete-linkage clustering of the segments into speech and non-speech clusters. We compare the accuracy of our method against state-of-the-art and standardised VAD techniques to demonstrate an absolute improvement of 15% in half-total error rate (HTER) over the best performing baseline system and across the QUT-NOISE-TIMIT database. We then apply our approach to the Audio-Visual Database of American English (AVDBAE) to demonstrate the performance of our algorithm in using visual, audio-visual or a proposed fusion of these features.