359 resultados para Clustering techniques
Resumo:
Digital collections are growing exponentially in size as the information age takes a firm grip on all aspects of society. As a result Information Retrieval (IR) has become an increasingly important area of research. It promises to provide new and more effective ways for users to find information relevant to their search intentions. Document clustering is one of the many tools in the IR toolbox and is far from being perfected. It groups documents that share common features. This grouping allows a user to quickly identify relevant information. If these groups are misleading then valuable information can accidentally be ignored. There- fore, the study and analysis of the quality of document clustering is important. With more and more digital information available, the performance of these algorithms is also of interest. An algorithm with a time complexity of O(n2) can quickly become impractical when clustering a corpus containing millions of documents. Therefore, the investigation of algorithms and data structures to perform clustering in an efficient manner is vital to its success as an IR tool. Document classification is another tool frequently used in the IR field. It predicts categories of new documents based on an existing database of (doc- ument, category) pairs. Support Vector Machines (SVM) have been found to be effective when classifying text documents. As the algorithms for classifica- tion are both efficient and of high quality, the largest gains can be made from improvements to representation. Document representations are vital for both clustering and classification. Representations exploit the content and structure of documents. Dimensionality reduction can improve the effectiveness of existing representations in terms of quality and run-time performance. Research into these areas is another way to improve the efficiency and quality of clustering and classification results. Evaluating document clustering is a difficult task. Intrinsic measures of quality such as distortion only indicate how well an algorithm minimised a sim- ilarity function in a particular vector space. Intrinsic comparisons are inherently limited by the given representation and are not comparable between different representations. Extrinsic measures of quality compare a clustering solution to a “ground truth” solution. This allows comparison between different approaches. As the “ground truth” is created by humans it can suffer from the fact that not every human interprets a topic in the same manner. Whether a document belongs to a particular topic or not can be subjective.
Resumo:
An algorithm based on the concept of combining Kalman filter and Least Error Square (LES) techniques is proposed in this paper. The algorithm is intended to estimate signal attributes like amplitude, frequency and phase angle in the online mode. This technique can be used in protection relays, digital AVRs, DGs, DSTATCOMs, FACTS and other power electronics applications. The Kalman filter is modified to operate on a fictitious input signal and provides precise estimation results insensitive to noise and other disturbances. At the same time, the LES system has been arranged to operate in critical transient cases to compensate the delay and inaccuracy identified because of the response of the standard Kalman filter. Practical considerations such as the effect of noise, higher order harmonics, and computational issues of the algorithm are considered and tested in the paper. Several computer simulations and a laboratory test are presented to highlight the usefulness of the proposed method. Simulation results show that the proposed technique can simultaneously estimate the signal attributes, even if it is highly distorted due to the presence of non-linear loads and noise.
Resumo:
This paper discusses the role of advance techniques for monitoring urban growth and change for sustainable development of urban environment. It also presents results of a case study involving satellite data for land use/land cover classification of Lucknow city using IRS-1C multi-spectral features. Two classification algorithms have been used in the study. Experiments were conducted to see the level of improvement in digital classification of urban environment using Artificial Neural Network (ANN) technique.
Resumo:
PURPOSE. To measure tear film surface quality in healthy and dry eye subjects using three noninvasive techniques of tear film quality assessment and to establish the ability of these noninvasive techniques to predict dry eye. METHODS. Thirty four subjects participated in the study, and were classified as dry eye or normal, based on standard clinical assessments. Three non-invasive techniques were applied for measurement of tear film surface quality: dynamic-area high-speed videokeratoscopy (HSV), wavefront sensing (DWS) and lateral shearing interferometry (LSI). The measurements were performed in both natural blinking conditions (NBC) and in suppressed blinking conditions (SBC). RESULTS. In order to investigate the capability of each method to discriminate dry eye subjects from normal subjects, the receiver operating curve (ROC) was calculated and then the area under the curve (AUC) was extracted. The best result was obtained for the LSI technique (AUC=0.80 in SBC and AUC=0.73 in NBC), which was followed by HSV (AUC=0.72 in SBC and AUC=0.71 in NBC). The best result for DWS was AUC=0.64 obtained for changes in vertical coma in suppressed blinking conditions, while for normal blinking conditions the results were poorer. CONCLUSIONS. Non-invasive techniques of tear film surface assessment can be used for predicting dry eye and this can be achieved in natural blinking as well as suppressed blinking conditions. In this study, LSI showed the best detection performance, closely followed by the dynamic-area HSV. The wavefront sensing technique was less powerful, particularly in natural blinking conditions.
Resumo:
Breast conservation therapy (BCT) is the procedure of choice for the management of the early stage breast cancer. However, its utilization has not been maximized because of logistics issues associated with the protracted treatment involved with the radiation treatment. Accelerated Partial Breast Irradiation (APBI) is an approach that treats only the lumpectomy bed plus a 1-2 cm margin, rather than the whole breast. Hence because of the small volume of irradiation a higher dose can be delivered in a shorter period of time. There has been growing interest for APBI and various approaches have been developed under phase I-III clinical studies; these include multicatheter interstitial brachytherapy, balloon catheter brachytherapy, conformal external beam radiation therapy and intra-operative radiation therapy (IORT). Balloon-based brachytherapy approaches include Mammosite, Axxent electronic brachytherapy and Contura, Hybrid brachytherapy devices include SAVI and ClearPath. This paper reviews the different techniques, identifying the weaknesses and strength of each approach and proposes a direction for future research and development. It is evident that APBI will play a role in the management of a selected group of early breast cancer. However, the relative role of the different techniques is yet to be clearly identified.
Resumo:
This overview focuses on the application of chemometrics techniques for the investigation of soils contaminated by polycyclic aromatic hydrocarbons (PAHs) and metals because these two important and very diverse groups of pollutants are ubiquitous in soils. The salient features of various studies carried out in the micro- and recreational environments of humans, are highlighted in the context of the various multivariate statistical techniques available across discipline boundaries that have been effectively used in soil studies. Particular attention is paid to techniques employed in the geosciences that may be effectively utilized for environmental soil studies; classical multivariate approaches that may be used in isolation or as complementary methods to these are also discussed. Chemometrics techniques widely applied in atmospheric studies for identifying sources of pollutants or for determining the importance of contaminant source contributions to a particular site, have seen little use in soil studies, but may be effectively employed in such investigations. Suitable programs are also available for suggesting mitigating measures in cases of soil contamination, and these are also considered. Specific techniques reviewed include pattern recognition techniques such as Principal Components Analysis (PCA), Fuzzy Clustering (FC) and Cluster Analysis (CA); geostatistical tools include variograms, Geographical Information Systems (GIS), contour mapping and kriging; source identification and contribution estimation methods reviewed include Positive Matrix Factorisation (PMF), and Principal Component Analysis on Absolute Principal Component Scores (PCA/APCS). Mitigating measures to limit or eliminate pollutant sources may be suggested through the use of ranking analysis and multi criteria decision making methods (MCDM). These methods are mainly represented in this review by studies employing the Preference Ranking Organisation Method for Enrichment Evaluation (PROMETHEE) and its associated graphic output, Geometrical Analysis for Interactive Aid (GAIA).
Resumo:
Rapid prototyping (RP) is a common name for several techniques, which read in data from computer-aided design (CAD) drawings and manufacture automatically threedimensional objects layer-by-layer according to the virtual design. The utilization of RP in tissue engineering enables the production of three-dimensional scaffolds with complex geometries and very fine structures. Adding micro- and nanometer details into the scaffolds improves the mechanical properties of the scaffold and ensures better cell adhesion to the scaffold surface. Thus, tissue engineering constructs can be customized according to the data acquired from the medical scans to match the each patient’s individual needs. In addition RP enables the control of the scaffold porosity making it possible to fabricate applications with desired structural integrity. Unfortunately, every RP process has its own unique disadvantages in building tissue engineering scaffolds. Hence, the future research should be focused into the development of RP machines designed specifically for fabrication of tissue engineering scaffolds, although RP methods already can serve as a link between tissue and engineering.
Resumo:
This paper presents a comprehensive discussion of vegetation management approaches in power line corridors based on aerial remote sensing techniques. We address three issues 1) strategies for risk management in power line corridors, 2) selection of suitable platforms and sensor suite for data collection and 3) the progress in automated data processing techniques for vegetation management. We present initial results from a series of experiments and, challenges and lessons learnt from our project.
Resumo:
Background: There has been a lack of investigation into the spatial distribution and clustering of suicide in Australia, where the population density is lower than many countries and varies dramatically among urban, rural and remote areas. This study aims to examine the spatial distribution of suicide at a Local Governmental Area (LGA) level and identify the LGAs with a high relative risk of suicide in Queensland, Australia, using geographical information system (GIS) techniques.---------- Methods: Data on suicide and demographic variables in each LGA between 1999 and 2003 were acquired from the Australian Bureau of Statistics. An age standardised mortality (ASM) rate for suicide was calculated at the LGA level. GIS techniques were used to examine the geographical difference of suicide across different areas.---------- Results: Far north and north-eastern Queensland (i.e., Cook and Mornington Shires) had the highest suicide incidence in both genders, while the south-western areas (i.e., Barcoo and Bauhinia Shires) had the lowest incidence in both genders. In different age groups (≤24 years, 25 to 44 years, 45 to 64 years, and ≥65 years), ASM rates of suicide varied with gender at the LGA level. Mornington and six other LGAs with low socioeconomic status in the upper Southeast had significant spatial clusters of high suicide risk.---------- Conclusions: There was a notable difference in ASM rates of suicide at the LGA level in Queensland. Some LGAs had significant spatial clusters of high suicide risk. The determinants of the geographical difference of suicide should be addressed in future research.
Resumo:
Advances in data mining have provided techniques for automatically discovering underlying knowledge and extracting useful information from large volumes of data. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large complex databases. Application of data mining to manufacturing is relatively limited mainly because of complexity of manufacturing data. Growing self organizing map (GSOM) algorithm has been proven to be an efficient algorithm to analyze unsupervised DNA data. However, it produced unsatisfactory clustering when used on some large manufacturing data. In this paper a data mining methodology has been proposed using a GSOM tool which was developed using a modified GSOM algorithm. The proposed method is used to generate clusters for good and faulty products from a manufacturing dataset. The clustering quality (CQ) measure proposed in the paper is used to evaluate the performance of the cluster maps. The paper also proposed an automatic identification of variables to find the most probable causative factor(s) that discriminate between good and faulty product by quickly examining the historical manufacturing data. The proposed method offers the manufacturers to smoothen the production flow and improve the quality of the products. Simulation results on small and large manufacturing data show the effectiveness of the proposed method.
Resumo:
A hierarchical structure is used to represent the content of the semi-structured documents such as XML and XHTML. The traditional Vector Space Model (VSM) is not sufficient to represent both the structure and the content of such web documents. Hence in this paper, we introduce a novel method of representing the XML documents in Tensor Space Model (TSM) and then utilize it for clustering. Empirical analysis shows that the proposed method is scalable for a real-life dataset as well as the factorized matrices produced from the proposed method helps to improve the quality of clusters due to the enriched document representation with both the structure and the content information.
Resumo:
Due to the change in attitudes and lifestyles, people expect to find new partners and friends via various ways now-a-days. Online dating networks create a network for people to meet each other and allow making contact with different objectives of developing a personal, romantic or sexual relationship. Due to the higher expectation of users, online matching companies are trying to adopt recommender systems. However, the existing recommendation techniques such as content-based, collaborative filtering or hybrid techniques focus on users explicit contact behaviors but ignore the implicit relationship among users in the network. This paper proposes a social matching system that uses past relations and user similarities in finding potential matches. The proposed system is evaluated on the dataset collected from an online dating network. Empirical analysis shows that the recommendation success rate has increased to 31% as compared to the baseline success rate of 19%.
Resumo:
Stereo vision is a method of depth perception, in which depth information is inferred from two (or more) images of a scene, taken from different perspectives. Applications of stereo vision include aerial photogrammetry, autonomous vehicle guidance, robotics, industrial automation and stereomicroscopy. A key issue in stereo vision is that of image matching, or identifying corresponding points in a stereo pair. The difference in the positions of corresponding points in image coordinates is termed the parallax or disparity. When the orientation of the two cameras is known, corresponding points may be projected back to find the location of the original object point in world coordinates. Matching techniques are typically categorised according to the nature of the matching primitives they use and the matching strategy they employ. This report provides a detailed taxonomy of image matching techniques, including area based, transform based, feature based, phase based, hybrid, relaxation based, dynamic programming and object space methods. A number of area based matching metrics as well as the rank and census transforms were implemented, in order to investigate their suitability for a real-time stereo sensor for mining automation applications. The requirements of this sensor were speed, robustness, and the ability to produce a dense depth map. The Sum of Absolute Differences matching metric was the least computationally expensive; however, this metric was the most sensitive to radiometric distortion. Metrics such as the Zero Mean Sum of Absolute Differences and Normalised Cross Correlation were the most robust to this type of distortion but introduced additional computational complexity. The rank and census transforms were found to be robust to radiometric distortion, in addition to having low computational complexity. They are therefore prime candidates for a matching algorithm for a stereo sensor for real-time mining applications. A number of issues came to light during this investigation which may merit further work. These include devising a means to evaluate and compare disparity results of different matching algorithms, and finding a method of assigning a level of confidence to a match. Another issue of interest is the possibility of statistically combining the results of different matching algorithms, in order to improve robustness.