916 resultados para Data Mining and its Application


Relevância:

100.00% 100.00%

Publicador:

Resumo:

La gran cantidad de datos que se registran diariamente en los sistemas de base de datos de las organizaciones ha generado la necesidad de analizarla. Sin embargo, se enfrentan a la complejidad de procesar enormes volúmenes de datos a través de métodos tradicionales de análisis. Además, dentro de un contexto globalizado y competitivo las organizaciones se mantienen en la búsqueda constante de mejorar sus procesos, para lo cual requieren herramientas que les permitan tomar mejores decisiones. Esto implica estar mejor informado y conocer su historia digital para describir sus procesos y poder anticipar (predecir) eventos no previstos. Estos nuevos requerimientos de análisis de datos ha motivado el desarrollo creciente de proyectos de minería de datos. El proceso de minería de datos busca obtener desde un conjunto masivo de datos, modelos que permitan describir los datos o predecir nuevas instancias en el conjunto. Implica etapas de: preparación de los datos, procesamiento parcial o totalmente automatizado para identificar modelos en los datos, para luego obtener como salida patrones, relaciones o reglas. Esta salida debe significar un nuevo conocimiento para la organización, útil y comprensible para los usuarios finales, y que pueda ser integrado a los procesos para apoyar la toma de decisiones. Sin embargo, la mayor dificultad es justamente lograr que el analista de datos, que interviene en todo este proceso, pueda identificar modelos lo cual es una tarea compleja y muchas veces requiere de la experiencia, no sólo del analista de datos, sino que también del experto en el dominio del problema. Una forma de apoyar el análisis de datos, modelos y patrones es a través de su representación visual, utilizando las capacidades de percepción visual del ser humano, la cual puede detectar patrones con mayor facilidad. Bajo este enfoque, la visualización ha sido utilizada en minería datos, mayormente en el análisis descriptivo de los datos (entrada) y en la presentación de los patrones (salida), dejando limitado este paradigma para el análisis de modelos. El presente documento describe el desarrollo de la Tesis Doctoral denominada “Nuevos Esquemas de Visualizaciones para Mejorar la Comprensibilidad de Modelos de Data Mining”. Esta investigación busca aportar con un enfoque de visualización para apoyar la comprensión de modelos minería de datos, para esto propone la metáfora de modelos visualmente aumentados. ABSTRACT The large amount of data to be recorded daily in the systems database of organizations has generated the need to analyze it. However, faced with the complexity of processing huge volumes of data over traditional methods of analysis. Moreover, in a globalized and competitive environment organizations are kept constantly looking to improve their processes, which require tools that allow them to make better decisions. This involves being bettered informed and knows your digital story to describe its processes and to anticipate (predict) unanticipated events. These new requirements of data analysis, has led to the increasing development of data-mining projects. The data-mining process seeks to obtain from a massive data set, models to describe the data or predict new instances in the set. It involves steps of data preparation, partially or fully automated processing to identify patterns in the data, and then get output patterns, relationships or rules. This output must mean new knowledge for the organization, useful and understandable for end users, and can be integrated into the process to support decision-making. However, the biggest challenge is just getting the data analyst involved in this process, which can identify models is complex and often requires experience not only of the data analyst, but also the expert in the problem domain. One way to support the analysis of the data, models and patterns, is through its visual representation, i.e., using the capabilities of human visual perception, which can detect patterns easily in any context. Under this approach, the visualization has been used in data mining, mostly in exploratory data analysis (input) and the presentation of the patterns (output), leaving limited this paradigm for analyzing models. This document describes the development of the doctoral thesis entitled "New Visualizations Schemes to Improve Understandability of Data-Mining Models". This research aims to provide a visualization approach to support understanding of data mining models for this proposed metaphor visually enhanced models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

O setor supermercadista sofreu grandes alterações nos últimos anos, principalmente com o avanço das tecnologias, a competição, a concentração e algumas insuficiências em seus processos. Estes e outros fatores favoreceram ao surgimento do movimento de ECR (Resposta de Consumidor Eficiente) que procura criar um relacionamento mais forte entre indústria e varejo através de novas visões para suas estratégias operacionais. A evolução das tecnologias de informação permitiram ao setor varejista gerar uma maior volume de dados a partir, principalmente, de seus check-outs. Entretanto, estes dados nem sempre são armazenados de forma correta ou utilizados de forma a se aproveitar a plenitude das informações neles contidas. O processo de transformar os dados em informação e conhecimento vem evoluindo constantemente. Uma das atuais metodologias de trabalhar dados é o Data Mining ou Mineração de Dados, que pode ser descrito como sendo uma variedade de ferramentas e estratégias que processam dados aumentando a utilidade destes em bancos de dados. Este trabalho analisa através de um estudo multicaso exploratório na região de Ribeirão Preto, no interior de São Paulo, a avaliação da capacidade do uso da tecnologia Data Mining para o fortalecimento do movimento ECR, principalmente em pequenos e médios varejistas e indústrias alimentícias, no sentido de oferecer a estes um diferencial de negociação para formação de alianças estratégias.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the wake of the disclosures surrounding PRISM and other US surveillance programmes, this paper assesses the large-scale surveillance practices by a selection of EU member states: the UK, Sweden, France, Germany and the Netherlands. Given the large-scale nature of these practices, which represent a reconfiguration of traditional intelligence gathering, the paper contends that an analysis of European surveillance programmes cannot be reduced to a question of the balance between data protection versus national security, but has to be framed in terms of collective freedoms and democracy. It finds that four of the five EU member states selected for in-depth examination are engaging in some form of large-scale interception and surveillance of communication data, and identifies parallels and discrepancies between these programmes and the NSA-run operations. The paper argues that these programmes do not stand outside the realm of EU intervention but can be analysed from an EU law perspective via i) an understanding of national security in a democratic rule of law framework where fundamental human rights and judicial oversight constitute key norms; ii) the risks posed to the internal security of the Union as a whole as well as the privacy of EU citizens as data owners and iii) the potential spillover into the activities and responsibilities of EU agencies. The paper then presents a set of policy recommendations to the European Parliament.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When applying multivariate analysis techniques in information systems and social science disciplines, such as management information systems (MIS) and marketing, the assumption that the empirical data originate from a single homogeneous population is often unrealistic. When applying a causal modeling approach, such as partial least squares (PLS) path modeling, segmentation is a key issue in coping with the problem of heterogeneity in estimated cause-and-effect relationships. This chapter presents a new PLS path modeling approach which classifies units on the basis of the heterogeneity of the estimates in the inner model. If unobserved heterogeneity significantly affects the estimated path model relationships on the aggregate data level, the methodology will allow homogenous groups of observations to be created that exhibit distinctive path model estimates. The approach will, thus, provide differentiated analytical outcomes that permit more precise interpretations of each segment formed. An application on a large data set in an example of the American customer satisfaction index (ACSI) substantiates the methodology’s effectiveness in evaluating PLS path modeling results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis is concerned with certain aspects of the Public Inquiry into the accident at Houghton Main Colliery in June 1975. It examines whether prior to the accident there existed at the Colliery a situation in which too much reliance was being placed upon state regulation and too Iittle upon personal responsibility. I study the phenomenon of state regulation. This is done (a) by analysis of selected writings on state regulation/intervention/interference/bureaucracy (the words are used synonymously) over the last two hundred years, specifically those of Marx on the 1866 Committee on Mines, and (b) by studying Chadwick and Tremenheere, leading and contrasting "bureaucrats" of the mid-nineteenth century. The bureaucratisation of the mining industry over the period 1835-1954 is described, and it is demonstrated that the industry obtained and now possesses those characteristics outlined by Max Weber in his model of bureaucracy. I analyse criticisms of the model and find them to be relevant, in that they facilitate understanding both of the circumstances of the accident and of the Inquiry . Further understanding of the circumstances and causes of the accident was gained by attendance at the lnquiry and by interviewing many of those involved in the Inquiry. I analyse many aspects of the Inquiry - its objectives. structure, procedure and conflicting interests - and find that, although the Inquiry had many of the symbols of bureaucracy, it suffered not from " too much" outside interference. but rather from the coal mining industry's shared belief in its ability to solve its own problems. I found nothing to suggest that, prior to the accident, colliery personnel relied. or were encouraged to rely, "too much" upon state regulation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the results of our data mining study of Pb-Zn (lead-zinc) ore assay records from a mine enterprise in Bulgaria. We examined the dataset, cleaned outliers, visualized the data, and created dataset statistics. A Pb-Zn cluster data mining model was created for segmentation and prediction of Pb-Zn ore assay data. The Pb-Zn cluster data model consists of five clusters and DMX queries. We analyzed the Pb-Zn cluster content, size, structure, and characteristics. The set of the DMX queries allows for browsing and managing the clusters, as well as predicting ore assay records. A testing and validation of the Pb-Zn cluster data mining model was developed in order to show its reasonable accuracy before beingused in a production environment. The Pb-Zn cluster data mining model can be used for changes of the mine grinding and floatation processing parameters in almost real-time, which is important for the efficiency of the Pb-Zn ore beneficiation process. ACM Computing Classification System (1998): H.2.8, H.3.3.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electronic database handling of buisness information has gradually gained its popularity in the hospitality industry. This article provides an overview on the fundamental concepts of a hotel database and investigates the feasibility of incorporating computer-assisted data mining techniques into hospitality database applications. The author also exposes some potential myths associated with data mining in hospitaltiy database applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining can be defined as the extraction of implicit, previously un-known, and potentially useful information from data. Numerous re-searchers have been developing security technology and exploring new methods to detect cyber-attacks with the DARPA 1998 dataset for Intrusion Detection and the modified versions of this dataset KDDCup99 and NSL-KDD, but until now no one have examined the performance of the Top 10 data mining algorithms selected by experts in data mining. The compared classification learning algorithms in this thesis are: C4.5, CART, k-NN and Naïve Bayes. The performance of these algorithms are compared with accuracy, error rate and average cost on modified versions of NSL-KDD train and test dataset where the instances are classified into normal and four cyber-attack categories: DoS, Probing, R2L and U2R. Additionally the most important features to detect cyber-attacks in all categories and in each category are evaluated with Weka’s Attribute Evaluator and ranked according to Information Gain. The results show that the classification algorithm with best performance on the dataset is the k-NN algorithm. The most important features to detect cyber-attacks are basic features such as the number of seconds of a network connection, the protocol used for the connection, the network service used, normal or error status of the connection and the number of data bytes sent. The most important features to detect DoS, Probing and R2L attacks are basic features and the least important features are content features. Unlike U2R attacks, where the content features are the most important features to detect attacks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The semiarid region of northeastern Brazil, the Caatinga, is extremely important due to its biodiversity and endemism. Measurements of plant physiology are crucial to the calibration of Dynamic Global Vegetation Models (DGVMs) that are currently used to simulate the responses of vegetation in face of global changes. In a field work realized in an area of preserved Caatinga forest located in Petrolina, Pernambuco, measurements of carbon assimilation (in response to light and CO2) were performed on 11 individuals of Poincianella microphylla, a native species that is abundant in this region. These data were used to calibrate the maximum carboxylation velocity (Vcmax) used in the INLAND model. The calibration techniques used were Multiple Linear Regression (MLR), and data mining techniques as the Classification And Regression Tree (CART) and K-MEANS. The results were compared to the UNCALIBRATED model. It was found that simulated Gross Primary Productivity (GPP) reached 72% of observed GPP when using the calibrated Vcmax values, whereas the UNCALIBRATED approach accounted for 42% of observed GPP. Thus, this work shows the benefits of calibrating DGVMs using field ecophysiological measurements, especially in areas where field data is scarce or non-existent, such as in the Caatinga

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Lignin and hemicelluloses are the major components limiting enzyme infiltration into cell walls. Determination of the topochemical distribution of lignin and aromatics in sugar cane might provide important data on the recalcitrance of specific cells. We used cellular ultraviolet (UV) microspectrophotometry (UMSP) to topochemically detect lignin and hydroxycinnamic acids in individual fiber, vessel and parenchyma cell walls of untreated and chlorite-treated sugar cane. Internodes, presenting typical vascular bundles and sucrose-storing parenchyma cells, were divided into rind and pith fractions. Results: Vascular bundles were more abundant in the rind, whereas parenchyma cells predominated in the pith region. UV measurements of untreated fiber cell walls gave absorbance spectra typical of grass lignin, with a band at 278 nm and a pronounced shoulder at 315 nm, assigned to the presence of hydroxycinnamic acids linked to lignin and/or to arabino-methylglucurono-xylans. The cell walls of vessels had the highest level of lignification, followed by those of fibers and parenchyma. Pith parenchyma cell walls were characterized by very low absorbance values at 278 nm; however, a distinct peak at 315 nm indicated that pith parenchyma cells are not extensively lignified, but contain significant amounts of hydroxycinnamic acids. Cellular UV image profiles scanned with an absorbance intensity maximum of 278 nm identified the pattern of lignin distribution in the individual cell walls, with the highest concentration occurring in the middle lamella and cell corners. Chlorite treatment caused a rapid removal of hydroxycinnamic acids from parenchyma cell walls, whereas the thicker fiber cell walls were delignified only after a long treatment duration (4 hours). Untreated pith samples were promptly hydrolyzed by cellulases, reaching 63% of cellulose conversion after 72 hours of hydrolysis, whereas untreated rind samples achieved only 20% hydrolyzation. Conclusion: The low recalcitrance of pith cells correlated with the low UV-absorbance values seen in parenchyma cells. Chlorite treatment of pith cells did not enhance cellulose conversion. By contrast, application of the same treatment to rind cells led to significant removal of hydroxycinnamic acids and lignin, resulting in marked enhancement of cellulose conversion by cellulases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Melanoma is a highly aggressive and therapy resistant tumor for which the identification of specific markers and therapeutic targets is highly desirable. We describe here the development and use of a bioinformatic pipeline tool, made publicly available under the name of EST2TSE, for the in silico detection of candidate genes with tissue-specific expression. Using this tool we mined the human EST (Expressed Sequence Tag) database for sequences derived exclusively from melanoma. We found 29 UniGene clusters of multiple ESTs with the potential to predict novel genes with melanoma-specific expression. Using a diverse panel of human tissues and cell lines, we validated the expression of a subset of three previously uncharacterized genes (clusters Hs.295012, Hs.518391, and Hs.559350) to be highly restricted to melanoma/melanocytes and named them RMEL1, 2 and 3, respectively. Expression analysis in nevi, primary melanomas, and metastatic melanomas revealed RMEL1 as a novel melanocytic lineage-specific gene up-regulated during melanoma development. RMEL2 expression was restricted to melanoma tissues and glioblastoma. RMEL3 showed strong up-regulation in nevi and was lost in metastatic tumors. Interestingly, we found correlations of RMEL2 and RMEL3 expression with improved patient outcome, suggesting tumor and/or metastasis suppressor functions for these genes. The three genes are composed of multiple exons and map to 2q12.2, 1q25.3, and 5q11.2, respectively. They are well conserved throughout primates, but not other genomes, and were predicted as having no coding potential, although primate-conserved and human-specific short ORFs could be found. Hairpin RNA secondary structures were also predicted. Concluding, this work offers new melanoma-specific genes for future validation as prognostic markers or as targets for the development of therapeutic strategies to treat melanoma.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work proposes a method based on both preprocessing and data mining with the objective of identify harmonic current sources in residential consumers. In addition, this methodology can also be applied to identify linear and nonlinear loads. It should be emphasized that the entire database was obtained through laboratory essays, i.e., real data were acquired from residential loads. Thus, the residential system created in laboratory was fed by a configurable power source and in its output were placed the loads and the power quality analyzers (all measurements were stored in a microcomputer). So, the data were submitted to pre-processing, which was based on attribute selection techniques in order to minimize the complexity in identifying the loads. A newer database was generated maintaining only the attributes selected, thus, Artificial Neural Networks were trained to realized the identification of loads. In order to validate the methodology proposed, the loads were fed both under ideal conditions (without harmonics), but also by harmonic voltages within limits pre-established. These limits are in accordance with IEEE Std. 519-1992 and PRODIST (procedures to delivery energy employed by Brazilian`s utilities). The results obtained seek to validate the methodology proposed and furnish a method that can serve as alternative to conventional methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An enantioselective liquid chromatographic method using two-phase hollow fiber liquid-phase microextraction (HF-LPME-HPLC) was developed for the determination of isradipine (ISR) enantiomers and its main metabolite (pyridine derivative of isradipine, PDI) in microsomal fractions isolated from rat liver. The analytes were extracted from 1 mL of microsomal medium using a two-phase HF-LPME procedure with hexyl acetate as the acceptor phase, 30 min of extraction, and sample agitation at 1,500 rpm. For the first time, ISR enantiomers and PDI were resolved. For this separation, a ChiralpakA (R) AD column with hexane/2-propanol/ethanol (94:04:02, v/v/v) as the mobile phase at a flow rate of 1.5 mL min(-1) was used. The column was kept at 23 A +/- 2 A degrees C. The drug and metabolite detection was performed at 325 nm and the internal standard oxybutynin was detected at 225 nm. The recoveries were 23% for PDI and 19% for each ISR enantiomer. The method presented quantification limits (LOQ) of 50 ng mL(-1) and was linear over the concentration range of 50-5,000 and 50-2,500 ng mL(-1) for PDI and each ISR enantiomer, respectively. The validated method was employed to an in vitro biotransformation study of ISR using rat liver microsomal fraction showing that (+)-(S)-ISR is preferentially biotransformed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The new technologies for Knowledge Discovery from Databases (KDD) and data mining promise to bring new insights into a voluminous growing amount of biological data. KDD technology is complementary to laboratory experimentation and helps speed up biological research. This article contains an introduction to KDD, a review of data mining tools, and their biological applications. We discuss the domain concepts related to biological data and databases, as well as current KDD and data mining developments in biology.