Biblioteca Digital

856 resultados para Data Driven Clustering

A scaleless data model for direct and progressive spatial query processing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A progressive spatial query retrieves spatial data based on previous queries (e.g., to fetch data in a more restricted area with higher resolution). A direct query, on the other side, is defined as an isolated window query. A multi-resolution spatial database system should support both progressive queries and traditional direct queries. It is conceptually challenging to support both types of query at the same time, as direct queries favour location-based data clustering, whereas progressive queries require fragmented data clustered by resolutions. Two new scaleless data structures are proposed in this paper. Experimental results using both synthetic and real world datasets demonstrate that the query processing time based on the new multiresolution approaches is comparable and often better than multi-representation data structures for both types of queries.

Observations in a maternity ward: Usability considerations for EHRs in an interrupt driven environment

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the increasing demand on healthcare systems it is imperative that all care is provided as efficiently and effectively as possible. Technology within the medical domain offers an exciting opportunity to augment work practices in order to meet these needs. This research project explores the implications of the interrupt-driven nature of work in clinical situations on documentation within an environment that increasingly involves electronic health records (EHRs). Midwives in a busy maternity ward were observed and interviewed about the work practices they employed to document information associated with patient care. The results showed that the interrupt-driven nature of the workplace, a feature common to many healthcare settings, led to a tension between the work and the work to document the work. Further, the IT environment in which the information was collected was not designed to cater for frequent interruption of the data entry process. Several recommendations for improving the IT environment are proposed to support health professionals in documenting patient data whilst attending to the interruptions. The recommendations include timeout screens, push technology, use of handheld PDAs, and cues to augment documentation in an interrupted session. Copyright © 2008 RMIT Publishing

Efficient spatial clustering algorithm using binary tree

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we present an efficient k-Means clustering algorithm for two dimensional data. The proposed algorithm re-organizes dataset into a form of nested binary tree*. Data items are compared at each node with only two nearest means with respect to each dimension and assigned to the one that has the closer mean. The main intuition of our research is as follows: We build the nested binary tree. Then we scan the data in raster order by in-order traversal of the tree. Lastly we compare data item at each node to the only two nearest means to assign the value to the intendant cluster. In this way we are able to save the computational cost significantly by reducing the number of comparisons with means and also by the least use to Euclidian distance formula. Our results showed that our method can perform clustering operation much faster than the classical ones. © Springer-Verlag Berlin Heidelberg 2005

Customer information system data pre-processing with feature selection techniques for non-technical losses prediction in an electricity market

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Non-technical losses (NTL) identification and prediction are important tasks for many utilities. Data from customer information system (CIS) can be used for NTL analysis. However, in order to accurately and efficiently perform NTL analysis, the original data from CIS need to be pre-processed before any detailed NTL analysis can be carried out. In this paper, we propose a feature selection based method for CIS data pre-processing in order to extract the most relevant information for further analysis such as clustering and classifications. By removing irrelevant and redundant features, feature selection is an essential step in data mining process in finding optimal subset of features to improve the quality of result by giving faster time processing, higher accuracy and simpler results with fewer features. Detailed feature selection analysis is presented in the paper. Both time-domain and load shape data are compared based on the accuracy, consistency and statistical dependencies between features.

A rough cluster analysis of shopping orientation data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes the application of a new technique, rough clustering, to the problem of market segmentation. Rough clustering produces different solutions to k-means analysis because of the possibility of multiple cluster membership of objects. Traditional clustering methods generate extensional descriptions of groups, that show which objects are members of each cluster. Clustering techniques based on rough sets theory generate intensional descriptions, which outline the main characteristics of each cluster. In this study, a rough cluster analysis was conducted on a sample of 437 responses from a larger study of the relationship between shopping orientation (the general predisposition of consumers toward the act of shopping) and intention to purchase products via the Internet. The cluster analysis was based on five measures of shopping orientation: enjoyment, personalization, convenience, loyalty, and price. The rough clusters obtained provide interpretations of different shopping orientations present in the data without the restriction of attempting to fit each object into only one segment. Such descriptions can be an aid to marketers attempting to identify potential segments of consumers.

Efficient Data Management with Applications to IoT

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Internet of Things (IoT) consists of a worldwide “network of networks,” composed by billions of interconnected heterogeneous devices denoted as things or “Smart Objects” (SOs). Significant research efforts have been dedicated to port the experience gained in the design of the Internet to the IoT, with the goal of maximizing interoperability, using the Internet Protocol (IP) and designing specific protocols like the Constrained Application Protocol (CoAP), which have been widely accepted as drivers for the effective evolution of the IoT. This first wave of standardization can be considered successfully concluded and we can assume that communication with and between SOs is no longer an issue. At this time, to favor the widespread adoption of the IoT, it is crucial to provide mechanisms that facilitate IoT data management and the development of services enabling a real interaction with things. Several reference IoT scenarios have real-time or predictable latency requirements, dealing with billions of device collecting and sending an enormous quantity of data. These features create a new need for architectures specifically designed to handle this scenario, hear denoted as “Big Stream”. In this thesis a new Big Stream Listener-based Graph architecture is proposed. Another important step, is to build more applications around the Web model, bringing about the Web of Things (WoT). As several IoT testbeds have been focused on evaluating lower-layer communication aspects, this thesis proposes a new WoT Testbed aiming at allowing developers to work with a high level of abstraction, without worrying about low-level details. Finally, an innovative SOs-driven User Interface (UI) generation paradigm for mobile applications in heterogeneous IoT networks is proposed, to simplify interactions between users and things.

Perspectivas e metodologias de pesquisa da Comunicação Social no contexto da internet com o Big Data e da especialização Data Scientist

Relevância:

30.00% 30.00%

Publicador:

Resumo:

O trabalho desenvolvido analisa a Comunicação Social no contexto da internet e delineia novas metodologias de estudo para a área na filtragem de significados no âmbito científico dos fluxos de informação das redes sociais, mídias de notícias ou qualquer outro dispositivo que permita armazenamento e acesso a informação estruturada e não estruturada. No intento de uma reflexão sobre os caminhos, que estes fluxos de informação se desenvolvem e principalmente no volume produzido, o projeto dimensiona os campos de significados que tal relação se configura nas teorias e práticas de pesquisa. O objetivo geral deste trabalho é contextualizar a área da Comunicação Social dentro de uma realidade mutável e dinâmica que é o ambiente da internet e fazer paralelos perante as aplicações já sucedidas por outras áreas. Com o método de estudo de caso foram analisados três casos sob duas chaves conceituais a Web Sphere Analysis e a Web Science refletindo os sistemas de informação contrapostos no quesito discursivo e estrutural. Assim se busca observar qual ganho a Comunicação Social tem no modo de visualizar seus objetos de estudo no ambiente das internet por essas perspectivas. O resultado da pesquisa mostra que é um desafio para o pesquisador da Comunicação Social buscar novas aprendizagens, mas a retroalimentação de informação no ambiente colaborativo que a internet apresenta é um caminho fértil para pesquisa, pois a modelagem de dados ganha corpus analítico quando o conjunto de ferramentas promovido e impulsionado pela tecnologia permite isolar conteúdos e possibilita aprofundamento dos significados e suas relações.

Minimum description length, regularisation and multi-modal data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Relationships between clustering, description length, and regularisation are pointed out, motivating the introduction of a cost function with a description length interpretation and the unusual and useful property of having its minimum approximated by the densest mode of a distribution. A simple inverse kinematics example is used to demonstrate that this property can be used to select and learn one branch of a multi-valued mapping. This property is also used to develop a method for setting regularisation parameters according to the scale on which structure is exhibited in the training data. The regularisation technique is demonstrated on two real data sets, a classification problem and a regression problem.

Semi-supervised learning of hierarchical latent trait models for data visualisation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An interactive hierarchical Generative Topographic Mapping (HGTM) ¸iteH_GTM has been developed to visualise complex data sets. In this paper, we build a more general visualisation system by extending the HGTM visualisation system in 3 directions: bf (1) We generalize HGTM to noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM) developed in ¸iteKaban_pami. bf (2) We give the user a choice of initializing the child plots of the current plot in either em interactive, or em automatic mode. In the interactive mode the user interactively selects ``regions of interest'' as in ¸iteH_GTM, whereas in the automatic mode an unsupervised minimum message length (MML)-driven construction of a mixture of LTMs is employed. bf (3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. We illustrate our approach on a toy example and apply our system to three more complex real data sets.

The impact of procurement-driven technological change on U.S. manufacturing productivity growth

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As we enter the 21st Century, technologies originally developed for defense purposes such as computers and satellite communications appear to have become a driving force behind economic growth in the United States. Paradoxically, almost all previous econometric models suggest that the largely defense-oriented federal industrial R&D funding that helped create these technologies had no discernible effect on U.S. industrial productivity growth. This paper addresses this paradox by stressing that defense procurement as well as federal R&D expenditures were targeted to a few narrowly defined manufacturing sub-sectors that produced high tech weaponry. Analysis employing data from the NBER Manufacturing Productivity Database and the BEA' s Input Output tables then demonstrates that defense procurement policies did have significant effects on the productivity performance of disaggregated manufacturing industries because of a process of procurement-driven technological change.

On the use of likert-type scales in multilevel data:Influence on aggregate variables

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In multilevel analyses, problems may arise when using Likert-type scales at the lowest level of analysis. Specifically, increases in variance should lead to greater censoring for the groups whose true scores fall at either end of the distribution. The current study used simulation methods to examine the influence of single-item Likert-type scale usage on ICC(1), ICC(2), and group-level correlations. Results revealed substantial underestimation of ICC(1) when using Likert-type scales with common response formats (e.g., 5 points). ICC(2) and group-level correlations were also underestimated, but to a lesser extent. Finally, the magnitude of underestimation was driven in large part to an interaction between Likert-type scale usage and the amounts of within- and between-group variance. © Sage Publications.

Clustering and spatial correlations of the neuronal cytoplasmic inclusions, astrocytic plaques and ballooned neurons in corticobasal degeneration

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study tested three hypotheses: (1) that there is clustering of the neuronal cytoplasmic inclusions (NCI), astrocytic plaques (AP) and ballooned neurons (BN) in corticobasal degeneration (CBD), (2) that the clusters of NCI and BN are not spatially correlated, and (3) that the lesions are correlated with disease ‘stage’. In 50% of the regions, clusters of lesions were 400–800 µm in diameter and regularly distributed parallel to the tissue boundary. Clusters of NCI and BN were larger in laminae II/III and V/VI, respectively. In a third of regions, the clusters of BN and NCI were negatively spatially correlated. Cluster size of the BN in the parahippocampal gyrus (PHG) was positively correlated with disease ‘stage’. The data suggest the following: (1) degeneration of the cortico-cortical pathways in CBD, (2) clusters of NCI and BN may affect different anatomical pathways and (3) BN may develop after the NCI in the PHG.

Correlations between the clustering patterns of the pathological changes in sporadic Creutzfeldt-Jakob disease

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Correlations between the clustering patterns of the vacuolation ('spongiform change'), prion protein (PrP) deposits, and surviving neurons were studied in the cerebral cortex, hippocampus, and cerebellum in 11 cases of sporadic Creutzfeldt-Jakob disease (sCJD). Differences in the sizes of the clusters of vacuoles were observed between brain regions and in the cerebral cortex, between the upper and lower laminae. With the exception of the parietal cortex, mean cluster size of the vacuoles was similar to that of the PrP deposits in each brain area. Clusters of the vacuoles were spatially correlated with the density of surviving neurons and with the clusters of PrP deposits in 47% and 53% of cortical areas analysed respectively but there were few spatial correlation between the PrP deposits and the density of surviving neurons. The data suggest that the pathology of sCJD may spread through the brain via specific anatomical pathways. Development of the clusters of vacuoles is spatially related to surviving neurons while the appearance of clusters of PrP deposits is related to the development of the vacuolation.

MILVA:An interactive tool for the exploration of multidimensional microarray data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering techniques such as k-means and hierarchical clustering are commonly used to analyze DNA microarray derived gene expression data. However, the interactions between processes underlying the cell activity suggest that the complexity of the microarray data structure may not be fully represented with discrete clustering methods.

Finding natural groups in data:an application to strategic group research

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The best way of finding “natural groups” in management research remains subject to debate and within the literature there is no accepted consensus. The principle motivation behind this study is to explore the effect of choices of method upon strategic group research, an area that has suffered enduring criticism, as we believe that these method choices are still not fully exploited. Our study is novel in the use of a variety of more robust clustering and validation techniques, rarely used in management research, some borrowed from the natural sciences, which may provide a useful and more robust base for this type of research. Our results confirm that methods do exist to address the concerns over strategic group research and adoption of our chosen methods will improve the quality of management research.

«
1
2
...
50
51
52
53
54
55
56
57
58
»