832 resultados para databases and data mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Companies require information in order to gain an improved understanding of their customers. Data concerning customers, their interests and behavior are collected through different loyalty programs. The amount of data stored in company data bases has increased exponentially over the years and become difficult to handle. This research area is the subject of much current interest, not only in academia but also in practice, as is shown by several magazines and blogs that are covering topics on how to get to know your customers, Big Data, information visualization, and data warehousing. In this Ph.D. thesis, the Self-Organizing Map and two extensions of it – the Weighted Self-Organizing Map (WSOM) and the Self-Organizing Time Map (SOTM) – are used as data mining methods for extracting information from large amounts of customer data. The thesis focuses on how data mining methods can be used to model and analyze customer data in order to gain an overview of the customer base, as well as, for analyzing niche-markets. The thesis uses real world customer data to create models for customer profiling. Evaluation of the built models is performed by CRM experts from the retailing industry. The experts considered the information gained with help of the models to be valuable and useful for decision making and for making strategic planning for the future.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this study is to retrospectively report the results of interventions for controlling a vancomycin-resistant enterococcus (VRE) outbreak in a tertiary-care pediatric intensive care unit (PICU) of a University Hospital. After identification of the outbreak, interventions were made at the following levels: patient care, microbiological surveillance, and medical and nursing staff training. Data were collected from computer-based databases and from the electronic prescription system. Vancomycin use progressively increased after March 2008, peaking in August 2009. Five cases of VRE infection were identified, with 3 deaths. After the interventions, we noted a significant reduction in vancomycin prescription and use (75% reduction), and the last case of VRE infection was identified 4 months later. The survivors remained colonized until hospital discharge. After interventions there was a transient increase in PICU length-of-stay and mortality. Since then, the use of vancomycin has remained relatively constant and strict, no other cases of VRE infection or colonization have been identified and length-of-stay and mortality returned to baseline. In conclusion, we showed that a bundle intervention aiming at a strict control of vancomycin use and full compliance with the Hospital Infection Control Practices Advisory Committee guidelines, along with contact precautions and hand-hygiene promotion, can be effective in reducing vancomycin use and the emergence and spread of vancomycin-resistant bacteria in a tertiary-care PICU.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The radial approach is widely used in the treatment of patients with coronary artery disease. We conducted a meta-analysis of published results on the efficacy and safety of the left and right radial approaches in patients undergoing percutaneous coronary procedures. A systematic search of reference databases was conducted, and data from 14 randomized controlled trials involving 6870 participants were analyzed. The left radial approach was associated with significant reductions in fluoroscopy time [standardized mean difference (SMD)=-0.14, 95% confidence interval (CI)=-0.19 to -0.09; P<0.00001] and contrast volume (SMD=-0.07, 95%CI=-0.12 to -0.02; P=0.009). There were no significant differences in rate of procedural failure of the left and the right radial approaches [risk ratios (RR)=0.98; 95%CI=0.77-1.25; P=0.88] or procedural time (SMD=-0.05, 95%CI=0.17-0.06; P=0.38). Tortuosity of the subclavian artery (RR=0.27, 95%CI=0.14-0.50; P<0.0001) was reported more frequently with the right radial approach. A greater number of catheters were used with the left than with the right radial approach (SMD=0.25, 95%CI=0.04-0.46; P=0.02). We conclude that the left radial approach is as safe as the right radial approach, and that the left radial approach should be recommended for use in percutaneous coronary procedures, especially in percutaneous coronary angiograms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this study was to contribute to the current knowledge-based theory by focusing on a research gap that exists in the empirically proven determination of the simultaneous but differentiable effects of intellectual capital (IC) assets and knowledge management (KM) practices on organisational performance (OP). The analysis was built on the past research and theoreticised interactions between the latent constructs specified using the survey-based items that were measured from a sample of Finnish companies for IC and KM and the dependent construct for OP determined using information available from financial databases. Two widely used and commonly recommended measures in the literature on management science, i.e. the return on total assets (ROA) and the return on equity (ROE), were calculated for OP. Thus the investigation of the relationship between IC and KM impacting OP in relation to the hypotheses founded was possible to conduct using objectively derived performance indicators. Using financial OP measures also strengthened the dynamic features of data needed in analysing simultaneous and causal dependences between the modelled constructs specified using structural path models. The estimates were obtained for the parameters of structural path models using a partial least squares-based regression estimator. Results showed that the path dependencies between IC and OP or KM and OP were always insignificant when analysed separate to any other interactions or indirect effects caused by simultaneous modelling and regardless of the OP measure used that was either ROA or ROE. The dependency between the constructs for KM and IC appeared to be very strong and was always significant when modelled simultaneously with other possible interactions between the constructs and using either ROA or ROE to define OP. This study, however, did not find statistically unambiguous evidence for proving the hypothesised causal mediation effects suggesting, for instance, that the effects of KM practices on OP are mediated by the IC assets. Due to the fact that some indication about the fluctuations of causal effects was assessed, it was concluded that further studies are needed for verifying the fundamental and likely hidden causal effects between the constructs of interest. Therefore, it was also recommended that complementary modelling and data processing measures be conducted for elucidating whether the mediation effects occur between IC, KM and OP, the verification of which requires further investigations of measured items and can be build on the findings of this study.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spatial data representation and compression has become a focus issue in computer graphics and image processing applications. Quadtrees, as one of hierarchical data structures, basing on the principle of recursive decomposition of space, always offer a compact and efficient representation of an image. For a given image, the choice of quadtree root node plays an important role in its quadtree representation and final data compression. The goal of this thesis is to present a heuristic algorithm for finding a root node of a region quadtree, which is able to reduce the number of leaf nodes when compared with the standard quadtree decomposition. The empirical results indicate that, this proposed algorithm has quadtree representation and data compression improvement when in comparison with the traditional method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mobile augmented reality applications are increasingly utilized as a medium for enhancing learning and engagement in history education. Although these digital devices facilitate learning through immersive and appealing experiences, their design should be driven by theories of learning and instruction. We provide an overview of an evidence-based approach to optimize the development of mobile augmented reality applications that teaches students about history. Our research aims to evaluate and model the impacts of design parameters towards learning and engagement. The research program is interdisciplinary in that we apply techniques derived from design-based experiments and educational data mining. We outline the methodological and analytical techniques as well as discuss the implications of the anticipated findings.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mobile augmented reality applications are increasingly utilized as a medium for enhancing learning and engagement in history education. Although these digital devices facilitate learning through immersive and appealing experiences, their design should be driven by theories of learning and instruction. We provide an overview of an evidence-based approach to optimize the development of mobile augmented reality applications that teaches students about history. Our research aims to evaluate and model the impacts of design parameters towards learning and engagement. The research program is interdisciplinary in that we apply techniques derived from design-based experiments and educational data mining. We outline the methodological and analytical techniques as well as discuss the implications of the anticipated findings.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Triple quadrupole mass spectrometers coupled with high performance liquid chromatography are workhorses in quantitative bioanalyses. It provides substantial benefits including reproducibility, sensitivity and selectivity for trace analysis. Selected Reaction Monitoring allows targeted assay development but data sets generated contain very limited information. Data mining and analysis of non-targeted high-resolution mass spectrometry profiles of biological samples offer the opportunity to perform more exhaustive assessments, including quantitative and qualitative analysis. The objectives of this study was to test method precision and accuracy, statistically compare bupivacaine drug concentration in real study samples and verify if high resolution and accurate mass data collected in scan mode can actually permit retrospective data analysis, more specifically, extract metabolite related information. The precision and accuracy data presented using both instruments provided equivalent results. Overall, the accuracy was ranging from 106.2 to 113.2% and the precision observed was from 1.0 to 3.7%. Statistical comparisons using a linear regression between both methods reveal a coefficient of determination (R2) of 0.9996 and a slope of 1.02 demonstrating a very strong correlation between both methods. Individual sample comparison showed differences from -4.5% to 1.6% well within the accepted analytical error. Moreover, post acquisition extracted ion chromatograms at m/z 233.1648 ± 5 ppm (M-56) and m/z 305.2224 ± 5 ppm (M+16) revealed the presence of desbutyl-bupivacaine and three distinct hydroxylated bupivacaine metabolites. Post acquisition analysis allowed us to produce semiquantitative evaluations of the concentration-time profiles for bupicavaine metabolites.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Les positions des évènements de recombinaison s’agrègent ensemble, formant des hotspots déterminés en partie par la protéine à évolution rapide PRDM9. En particulier, ces positions de hotspots sont déterminées par le domaine de doigts de zinc (ZnF) de PRDM9 qui reconnait certains motifs d’ADN. Les allèles de PRDM9 contenant le ZnF de type k ont été préalablement associés avec une cohorte de patients affectés par la leucémie aigüe lymphoblastique. Les allèles de PRDM9 sont difficiles à identifier à partir de données de séquençage de nouvelle génération (NGS), en raison de leur nature répétitive. Dans ce projet, nous proposons une méthode permettant la caractérisation d’allèles de PRDM9 à partir de données de NGS, qui identifie le nombre d’allèles contenant un type spécifique de ZnF. Cette méthode est basée sur la corrélation entre les profils représentant le nombre de séquences nucléotidiques uniques à chaque ZnF retrouvés chez les lectures de NGS simulées sans erreur d’une paire d’allèles et chez les lectures d’un échantillon. La validité des prédictions obtenues par notre méthode est confirmée grâce à analyse basée sur les simulations. Nous confirmons également que la méthode peut correctement identifier le génotype d’allèles de PRDM9 qui n’ont pas encore été identifiés. Nous conduisons une analyse préliminaire identifiant le génotype des allèles de PRDM9 contenant un certain type de ZnF dans une cohorte de patients atteints de glioblastomes multiforme pédiatrique, un cancer du cerveau caractérisé par les mutations récurrentes dans le gène codant pour l’histone H3, la cible de l’activité épigénétique de PRDM9. Cette méthode ouvre la possibilité d’identifier des associations entre certains allèles de PRDM9 et d’autres types de cancers pédiatriques, via l’utilisation de bases de données de NGS de cellules tumorales.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis Entitled Environmental impact of Sand Mining :A case Study in the river catchments of vembanad lake southwest india.The entire study is addressed in nine chapters. Chapter l deals with the general introduction about rivers, problems of river sand mining, objectives, location of the study area and scope of the study. A detailed review on river classification, classic concepts in riverine studies, geological work of rivers and channel processes, importance of river ecosystems and its need for management are dealt in Chapter 2. Chapter 3 gives a comprehensive account of the study area - its location, administrative divisions, physiography, soil, geology, land use and living and non-living resources. The various methods adopted in the study are dealt in Chapter 4. Chapter 5 contains river characteristics like drainage, environmental and geologic setting, channel characteristics, river discharge and water quality of the study area. Chapter 6 gives an account of river sand mining (instream and floodplain mining) from the study area. The various environmental problems of river sand mining on the land adjoining the river banks, river channel, water, biotic and social / human environments of the area and data interpretation are presented in Chapter 7. Chapter 8 deals with the Environmental Impact Assessment (EIA) and Environmental Management Plan (EMP) of sand mining from the river catchments of Vembanad lake.