938 resultados para Elaborazione d’immagini, Microscopia, Istopatologia, Classificazione, K-means


Relevância:

100.00% 100.00%

Publicador:

Resumo:

本文提出一种聚类引导搜索(cluster guide searching,CGS)的路径规划方法。采用基于最大最小距离的K均值聚类方法对样本进行离线聚类学习,学习结果以相似环境相似决策的知识形式进行存储。路径规划过程中,机器人在线整理环境信息,获得输入空间样本,通过与知识库匹配,检索到最近的类别,然后在该类别内部采用速度优先策略和方向优先策略交替的方式搜索输出空间。若知识不完备导致检索失败,可重启线性规划算法(linear programming,LP)进行在线路径规划,并更新聚类知识库。仿真结果表明该方法是一种有效的路径规划学习方法。

Relevância:

100.00% 100.00%

Publicador:

Resumo:

针对视频监控系统,提出了一种改进的基于区域的运动目标分割方法。与传统方法相比,在运动检测阶段,结合时域差分和背景差分进行运动检测,并通过自适应方法进行背景更新;在差分图像二值化时,采用自适应阈值方法来代替传统的手工确定阈值法;对于区域分割,使用基于加权平方欧式距离的均值聚类算法代替传统的均值聚类算法。实验结果表明该改进方法比传统方法具有更好的实时性、鲁棒性和有效性。

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clare, A. and King R.D. (2002) How well do we understand the clusters found in microarray data? In In Silico Biol. 2, 0046

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new approach is proposed for clustering time-series data. The approach can be used to discover groupings of similar object motions that were observed in a video collection. A finite mixture of hidden Markov models (HMMs) is fitted to the motion data using the expectation-maximization (EM) framework. Previous approaches for HMM-based clustering employ a k-means formulation, where each sequence is assigned to only a single HMM. In contrast, the formulation presented in this paper allows each sequence to belong to more than a single HMM with some probability, and the hard decision about the sequence class membership can be deferred until a later time when such a decision is required. Experiments with simulated data demonstrate the benefit of using this EM-based approach when there is more "overlap" in the processes generating the data. Experiments with real data show the promising potential of HMM-based motion clustering in a number of applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number of all text characters on a web page. K-means clustering is used to create unique thresholds to differentiate index pages and article pages on individual web sites. Index pages contain mostly links to articles and other indices, while article pages contain mostly text. We also present a novel link grouping algorithm using agglomerative hierarchical clustering that groups links in the same spatial neighborhood together while preserving link structure. Grouping allows users with severe disabilities to use a scan-based mechanism to tab through a web page and select items. In experiments, we saw up to a 40-fold reduction in the number of commands needed to click on a link with a scan-based interface, which shows that we can vastly improve the rate of communication for users with disabilities. We used web page classification and link grouping to alter web page display on an accessible web browser that we developed to make a usable browsing interface for users with disabilities. Our classification method consistently outperformed a baseline classifier even when using minimal data to generate article and index clusters, and achieved classification accuracy of 94.0% on web sites with well-formed or slightly malformed HTML, compared with 80.1% accuracy for the baseline classifier.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Strategic reviews of the Irish Food and Beverage Industry have consistently emphasised the need for food and beverage firms to improve their innovation and marketing capabilities, in order to maintain competitiveness in both domestic and overseas markets. In particular, the functional food and beverages market has been singled out as an extremely important emerging market, which Irish firms could benefit from through an increased technological and market orientation. Although health and wellness have been the most significant drivers of new product development (NPD) in recent years, failure rates for new functional foods and beverages have been reportedly high. In that context, researchers in the US, UK, Denmark and Ireland have reported a marked divergence between NPD practices within food and beverage firms and normative advice for successful product development. The high reported failure rates for new functional foods and beverages suggest a failure to manage customer knowledge effectively, as well as a lack of knowledge management between functional disciplines involved in the NPD process. This research explored the concept of managing customer knowledge at the early stages of the NPD process, and applied it to the development of a range of functional beverages, through the use of advanced concept optimisation research techniques, which provided for a more market-oriented approach to new food product development. A sequential exploratory research design strategy using mixed research methods was chosen for this study. First, the qualitative element of this research investigated customers’ choice motives for orange juice and soft drinks, and explored their attitudes and perceptions towards a range of new functional beverage concepts through a combination of 15 in-depth interviews and 3 focus groups. Second, the quantitative element of this research consisted of 3 conjoint-based questionnaires administered to 400 different customers in each study in order to model their purchase preferences for chilled nutrient-enriched and probiotic orange juices, and stimulant soft drinks. The in-depth interviews identified the key product design attributes that influenced customers’ choice motives for orange juice. The focus group discussions revealed that groups of customers were negative towards the addition of certain functional ingredients to natural foods and beverages. K-means cluster analysis was used to quantitatively identify segments of customers with similar preferences for chilled nutrient-enriched and probiotic orange juices, and stimulant soft drinks. Overall, advanced concept optimisation research methods facilitate the integration of the customer at the early stages of the NPD process, which promotes a multi-disciplinary approach to new food product design. This research illustrated how advanced concept optimisation research methods could contribute towards effective and efficient knowledge management in the new food product development process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The identification and classification of network traffic and protocols is a vital step in many quality of service and security systems. Traffic classification strategies must evolve, alongside the protocols utilising the Internet, to overcome the use of ephemeral or masquerading port numbers and transport layer encryption. This research expands the concept of using machine learning on the initial statistics of flow of packets to determine its underlying protocol. Recognising the need for efficient training/retraining of a classifier and the requirement for fast classification, the authors investigate a new application of k-means clustering referred to as 'two-way' classification. The 'two-way' classification uniquely analyses a bidirectional flow as two unidirectional flows and is shown, through experiments on real network traffic, to improve classification accuracy by as much as 18% when measured against similar proposals. It achieves this accuracy while generating fewer clusters, that is, fewer comparisons are needed to classify a flow. A 'two-way' classification offers a new way to improve accuracy and efficiency of machine learning statistical classifiers while still maintaining the fast training times associated with the k-means.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The environmental quality of land can be assessed by calculating relevant threshold values, which differentiate between concentrations of elements resulting from geogenic and diffuse anthropogenic sources and concentrations generated by point sources of elements. A simple process allowing the calculation of these typical threshold values (TTVs) was applied across a region of highly complex geology (Northern Ireland) to six elements of interest; arsenic, chromium, copper, lead, nickel and vanadium. Three methods for identifying domains (areas where a readily identifiable factor can be shown to control the concentration of an element) were used: k-means cluster analysis, boxplots and empirical cumulative distribution functions (ECDF). The ECDF method was most efficient at determining areas of both elevated and reduced concentrations and was used to identify domains in this investigation. Two statistical methods for calculating normal background concentrations (NBCs) and upper limits of geochemical baseline variation (ULBLs), currently used in conjunction with legislative regimes in the UK and Finland respectively, were applied within each domain. The NBC methodology was constructed to run within a specific legislative framework, and its use on this soil geochemical data set was influenced by the presence of skewed distributions and outliers. In contrast, the ULBL methodology was found to calculate more appropriate TTVs that were generally more conservative than the NBCs. TTVs indicate what a "typical" concentration of an element would be within a defined geographical area and should be considered alongside the risk that each of the elements pose in these areas to determine potential risk to receptors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a novel method for the light-curve characterization of Pan-STARRS1 Medium Deep Survey (PS1 MDS) extragalactic sources into stochastic variables (SVs) and burst-like (BL) transients, using multi-band image-differencing time-series data. We select detections in difference images associated with galaxy hosts using a star/galaxy catalog extracted from the deep PS1 MDS stacked images, and adopt a maximum a posteriori formulation to model their difference-flux time-series in four Pan-STARRS1 photometric bands gP1, rP1, iP1, and zP1. We use three deterministic light-curve models to fit BL transients; a Gaussian, a Gamma distribution, and an analytic supernova (SN) model, and one stochastic light-curve model, the Ornstein-Uhlenbeck process, in order to fit variability that is characteristic of active galactic nuclei (AGNs). We assess the quality of fit of the models band-wise and source-wise, using their estimated leave-out-one cross-validation likelihoods and corrected Akaike information criteria. We then apply a K-means clustering algorithm on these statistics, to determine the source classification in each band. The final source classification is derived as a combination of the individual filter classifications, resulting in two measures of classification quality, from the averages across the photometric filters of (1) the classifications determined from the closest K-means cluster centers, and (2) the square distances from the clustering centers in the K-means clustering spaces. For a verification set of AGNs and SNe, we show that SV and BL occupy distinct regions in the plane constituted by these measures. We use our clustering method to characterize 4361 extragalactic image difference detected sources, in the first 2.5 yr of the PS1 MDS, into 1529 BL, and 2262 SV, with a purity of 95.00% for AGNs, and 90.97% for SN based on our verification sets. We combine our light-curve classifications with their nuclear or off-nuclear host galaxy offsets, to define a robust photometric sample of 1233 AGNs and 812 SNe. With these two samples, we characterize their variability and host galaxy properties, and identify simple photometric priors that would enable their real-time identification in future wide-field synoptic surveys.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Energy efficiency is an essential requirement for all contemporary computing systems. We thus need tools to measure the energy consumption of computing systems and to understand how workloads affect it. Significant recent research effort has targeted direct power measurements on production computing systems using on-board sensors or external instruments. These direct methods have in turn guided studies of software techniques to reduce energy consumption via workload allocation and scaling. Unfortunately, direct energy measurements are hampered by the low power sampling frequency of power sensors. The coarse granularity of power sensing limits our understanding of how power is allocated in systems and our ability to optimize energy efficiency via workload allocation.
We present ALEA, a tool to measure power and energy consumption at the granularity of basic blocks, using a probabilistic approach. ALEA provides fine-grained energy profiling via sta- tistical sampling, which overcomes the limitations of power sens- ing instruments. Compared to state-of-the-art energy measurement tools, ALEA provides finer granularity without sacrificing accuracy. ALEA achieves low overhead energy measurements with mean error rates between 1.4% and 3.5% in 14 sequential and paral- lel benchmarks tested on both Intel and ARM platforms. The sampling method caps execution time overhead at approximately 1%. ALEA is thus suitable for online energy monitoring and optimization. Finally, ALEA is a user-space tool with a portable, machine-independent sampling method. We demonstrate two use cases of ALEA, where we reduce the energy consumption of a k-means computational kernel by 37% and an ocean modelling code by 33%, compared to high-performance execution baselines, by varying the power optimization strategy between basic blocks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nos últimos anos temos vindo a assistir a uma mudança na forma como a informação é disponibilizada online. O surgimento da web para todos possibilitou a fácil edição, disponibilização e partilha da informação gerando um considerável aumento da mesma. Rapidamente surgiram sistemas que permitem a coleção e partilha dessa informação, que para além de possibilitarem a coleção dos recursos também permitem que os utilizadores a descrevam utilizando tags ou comentários. A organização automática dessa informação é um dos maiores desafios no contexto da web atual. Apesar de existirem vários algoritmos de clustering, o compromisso entre a eficácia (formação de grupos que fazem sentido) e a eficiência (execução em tempo aceitável) é difícil de encontrar. Neste sentido, esta investigação tem por problemática aferir se um sistema de agrupamento automático de documentos, melhora a sua eficácia quando se integra um sistema de classificação social. Analisámos e discutimos dois métodos baseados no algoritmo k-means para o clustering de documentos e que possibilitam a integração do tagging social nesse processo. O primeiro permite a integração das tags diretamente no Vector Space Model e o segundo propõe a integração das tags para a seleção das sementes iniciais. O primeiro método permite que as tags sejam pesadas em função da sua ocorrência no documento através do parâmetro Social Slider. Este método foi criado tendo por base um modelo de predição que sugere que, quando se utiliza a similaridade dos cossenos, documentos que partilham tags ficam mais próximos enquanto que, no caso de não partilharem, ficam mais distantes. O segundo método deu origem a um algoritmo que denominamos k-C. Este para além de permitir a seleção inicial das sementes através de uma rede de tags também altera a forma como os novos centróides em cada iteração são calculados. A alteração ao cálculo dos centróides teve em consideração uma reflexão sobre a utilização da distância euclidiana e similaridade dos cossenos no algoritmo de clustering k-means. No contexto da avaliação dos algoritmos foram propostos dois algoritmos, o algoritmo da “Ground truth automática” e o algoritmo MCI. O primeiro permite a deteção da estrutura dos dados, caso seja desconhecida, e o segundo é uma medida de avaliação interna baseada na similaridade dos cossenos entre o documento mais próximo de cada documento. A análise de resultados preliminares sugere que a utilização do primeiro método de integração das tags no VSM tem mais impacto no algoritmo k-means do que no algoritmo k-C. Além disso, os resultados obtidos evidenciam que não existe correlação entre a escolha do parâmetro SS e a qualidade dos clusters. Neste sentido, os restantes testes foram conduzidos utilizando apenas o algoritmo k-C (sem integração de tags no VSM), sendo que os resultados obtidos indicam que a utilização deste algoritmo tende a gerar clusters mais eficazes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering and Disjoint Principal Component Analysis (CDP CA) is a constrained principal component analysis recently proposed for clustering of objects and partitioning of variables, simultaneously, which we have implemented in R language. In this paper, we deal in detail with the alternating least-squares algorithm for CDPCA and highlight its algebraic features for constructing both interpretable principal components and clusters of objects. Two applications are given to illustrate the capabilities of this new methodology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

O processo da tomada de decisão sobre a avaliação de uma solicitação de crédito comercial é por vezes difícil para o julgamento humano, devido à imensidão de variáveis que estão em jogo e das suas inter- relações. Neste artigo propomo-nos identificar as características dos clientes associadas a alto e a baixo risco, com recurso a um modelo aplicacional. A partir de uma base de dados de um cartão de crédito, formada por variáveis de natureza qualitativa e quantitativa, ajustámos um modelo logit binário, com o objectivo de tornar o processo de decisão mais objectivo e quantificável. Em seguida, identificámos oito classes de risco através da aplicação de um método de classificação não hierárquica (K-means) sobre o vector da pontuação do modelo logit. Aferimos temporalmente o comportamento de cada classe de risco ao longo de 70 meses, verificando-se que probabilidades baixas de default estão associadas a classes de risco baixo. As características dos clientes tipicamente associadas ao risco de crédito foram identificadas através de uma Análise Factorial das Correspondências.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação de mestrado, Marketing, Faculdade de Economia, Universidade do Algarve, 2015

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As redes sociais virtuais são um meio potencialmente rápido e económico de promoção de negócios onde se geram clientes potenciais, exposição para o negocio, informações de mercado e tráfego do Website; se promove o marketing, a recomendação, o marketing directo, a gestão da marca e a prospecção de dados/ pesquisa e se potencia a subcontratação de tarefas de desing/ desenvolvimento, pesquisa, criação de conteúdo e gestão de comunidade. O estudo teve por base um questionário colocado nas redes sociais virtuais e no grupo de divulgação da Association for Information Systems, de 12 de Abril a 14 de Junho de 2012, tendo-se obtido 450 respostas, das quais 330 foram validas.Obtiveram-se respostas de todo o Mundo, predominantemente de Portugal(61,33%) e Brasil(10,89%), tendo-se concluído que o Facebook(78,51%) e o Linkedin(71,99%) são percebidos como as redes sociais virtuais mais úteis na promoção de negócios. Para melhor compreender a percepção que os utilizadores das redes sociais virtuais têm sobre as vantagens e oportunidades destas redes na promoção de negócios, foi utilizada a analise de clusters tendo a solução k-means se mostrando a mais estável e a de mais fácil interpretação lógica, permitindo a segmentação dos utilizadores em três clusters: Cluster 1("mais pessimista"), Cluster 2("intermédio") e Cluster 3("mais optimista"). Esta segmentação permite identificar correlações entre as variáveis grupo, morada, sexo, área de estudo, situação profissional e o numero de empregados do negocio, com os diferentes segmentos. Adicionalmente, verificam-se correlações entre as variáveis grupo, morada, sexo, área de estudo e situação profissional e a variável horas/ semana a usar as redes sociais virtuais na promoção de negócios. Espera-se que este trabalho contribua para a identificação e desenvolvimento dos métodos e estratégias que potenciem a promoção de negócios nas redes sociais virtuais.