877 resultados para data-mining application


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The thesis represents the conclusive outcome of the European Joint Doctorate programmein Law, Science & Technology funded by the European Commission with the instrument Marie Skłodowska-Curie Innovative Training Networks actions inside of the H2020, grantagreement n. 814177. The tension between data protection and privacy from one side, and the need of granting further uses of processed personal datails is investigated, drawing the lines of the technological development of the de-anonymization/re-identification risk with an explorative survey. After acknowledging its span, it is questioned whether a certain degree of anonymity can still be granted focusing on a double perspective: an objective and a subjective perspective. The objective perspective focuses on the data processing models per se, while the subjective perspective investigates whether the distribution of roles and responsibilities among stakeholders can ensure data anonymity.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the Era of precision medicine and big medical data sharing, it is necessary to solve the work-flow of digital radiological big data in a productive and effective way. In particular, nowadays, it is possible to extract information “hidden” in digital images, in order to create diagnostic algorithms helping clinicians to set up more personalized therapies, which are in particular targets of modern oncological medicine. Digital images generated by the patient have a “texture” structure that is not visible but encrypted; it is “hidden” because it cannot be recognized by sight alone. Thanks to artificial intelligence, pre- and post-processing software and generation of mathematical calculation algorithms, we could perform a classification based on non-visible data contained in radiological images. Being able to calculate the volume of tissue body composition could lead to creating clasterized classes of patients inserted in standard morphological reference tables, based on human anatomy distinguished by gender and age, and maybe in future also by race. Furthermore, the branch of “morpho-radiology" is a useful modality to solve problems regarding personalized therapies, which is particularly needed in the oncological field. Actually oncological therapies are no longer based on generic drugs but on target personalized therapy. The lack of gender and age therapies table could be filled thanks to morpho-radiology data analysis application.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A global italian pharmaceutical company has to provide two work environments that favor different needs. The environments will allow to develop solutions in a controlled, secure and at the same time in an independent manner on a state-of-the-art enterprise cloud platform. The need of developing two different environments is dictated by the needs of the working units. Indeed, the first environment is designed to facilitate the creation of application related to genomics, therefore, designed more for data-scientists. This environment is capable of consuming, producing, retrieving and incorporating data, furthermore, will support the most used programming languages for genomic applications (e.g., Python, R). The proposal was to obtain a pool of ready-togo Virtual Machines with different architectures to provide best performance based on the job that needs to be carried out. The second environment has more of a traditional trait, to obtain, via ETL (Extract-Transform-Load) process, a global datamodel, resembling a classical relational structure. It will provide major BI operations (e.g., analytics, performance measure, reports, etc.) that can be leveraged both for application analysis or for internal usage. Since, both architectures will maintain large amounts of data regarding not only pharmaceutical informations but also internal company informations, it would be possible to digest the data by reporting/ analytics tools and also apply data-mining, machine learning technologies to exploit intrinsic informations. The thesis work will introduce, proposals, implementations, descriptions of used technologies/platforms and future works of the above discussed environments.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Objetivou-se com este trabalho utilizar regras de associação para identificar forças de mercado que regem a comercialização de touros com avaliação genética pelo programa Nelore Brasil. Essas regras permitem evidenciar padrões implícitos nas transações de grandes bases de dados, indicando causas e efeitos determinantes da oferta e comercialização de touros. Na análise foram considerados 19.736 registros de touros comercializados, 17 fazendas e 15 atributos referentes às diferenças esperadas nas progênies dos reprodutores, local e época da venda. Utilizou-se um sistema com interface gráfica usuário-dirigido que permite geração e seleção interativa de regras de associação. Análise de Pareto foi aplicada para as três medidas objetivas (suporte, confiança e lift) que acompanham cada uma das regras de associação, para validação das mesmas. Foram geradas 2.667 regras de associação, 164 consideradas úteis pelo usuário e 107 válidas para lift ≥ 1,0505. As fazendas participantes do programa Nelore Brasil apresentam especializações na oferta de touros, segundo características para habilidade materna, ganho de peso, fertilidade, precocidade sexual, longevidade, rendimento e terminação de carcaça. Os perfis genéticos dos touros são diferentes para as variedades padrão e mocho. Algumas regiões brasileiras são nichos de mercado para touros sem registro genealógico. A análise de evolução de mercado sugere que o mérito genético total, índice oficial do programa Nelore Brasil, tornou-se um importante índice para comercialização dos touros. Com o uso das regras de associação, foi possível descobrir forças do mercado e identificar combinações de atributos genéticos, geográficos e temporais que determinam a comercialização de touros no programa Nelore Brasil.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Age-related changes in running kinematics have been reported in the literature using classical inferential statistics. However, this approach has been hampered by the increased number of biomechanical gait variables reported and subsequently the lack of differences presented in these studies. Data mining techniques have been applied in recent biomedical studies to solve this problem using a more general approach. In the present work, we re-analyzed lower extremity running kinematic data of 17 young and 17 elderly male runners using the Support Vector Machine (SVM) classification approach. In total, 31 kinematic variables were extracted to train the classification algorithm and test the generalized performance. The results revealed different accuracy rates across three different kernel methods adopted in the classifier, with the linear kernel performing the best. A subsequent forward feature selection algorithm demonstrated that with only six features, the linear kernel SVM achieved 100% classification performance rate, showing that these features provided powerful combined information to distinguish age groups. The results of the present work demonstrate potential in applying this approach to improve knowledge about the age-related differences in running gait biomechanics and encourages the use of the SVM in other clinical contexts. (C) 2010 Elsevier Ltd. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The productivity associated with commonly available disassembly methods today seldomly makes disassembly the preferred end-of-life solution for massive take back product streams. Systematic reuse of parts or components, or recycling of pure material fractions are often not achievable in an economically sustainable way. In this paper a case-based review of current disassembly practices is used to analyse the factors influencing disassembly feasibility. Data mining techniques were used to identify major factors influencing the profitability of disassembly operations. Case characteristics such as involvement of the product manufacturer in the end-of-life treatment and continuous ownership are some of the important dimensions. Economic models demonstrate that the efficiency of disassembly operations should be increased an order of magnitude to assure the competitiveness of ecologically preferred, disassembly oriented end-of-life scenarios for large waste of electric and electronic equipment (WEEE) streams. Technological means available to increase the productivity of the disassembly operations are summarized. Automated disassembly techniques can contribute to the robustness of the process, but do not allow to overcome the efficiency gap if not combined with appropriate product design measures. Innovative, reversible joints, collectively activated by external trigger signals, form a promising approach to low cost, mass disassembly in this context. A short overview of the state-of-the-art in the development of such self-disassembling joints is included. (c) 2008 CIRP.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Several aspects of photoperception and light signal transduction have been elucidated by studies with model plants. However, the information available for economically important crops, such as Fabaceae species, is scarce. In order to incorporate the existing genomic tools into a strategy to advance soybean research, we have investigated publicly available expressed sequence tag ( EST) sequence databases in order to identify Glycine max sequences related to genes involved in light-regulated developmental control in model plants. Approximately 38,000 sequences from open-access databases were investigated, and all bona fide and putative photoreceptor gene families were found in soybean sequence databases. We have identified G. max orthologs for several families of transcriptional regulators and cytoplasmic proteins mediating photoreceptor-induced responses, although some important Arabidopsis phytochrome-signaling components are absent. Moreover, soybean and Arabidopsis gene-family homologs appear to have undergone a distinct expansion process in some cases. We propose a working model of light perception, signal transduction and response-eliciting in G. max, based on the identified key components from Arabidopsis. These results demonstrate the power of comparative genomics between model systems and crop species to elucidate several aspects of plant physiology and metabolism.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We re-mapped the soils of the Murray-Darling Basin (MDB) in 1995-1998 with a minimum of new fieldwork, making the most out of existing data. We collated existing digital soil maps and used inductive spatial modelling to predict soil types from those maps combined with environmental predictor variables. Lithology, Landsat Multi Spectral Scanner (Landsat MSS), the 9-s digital elevation model (DEM) of Australia and derived terrain attributes, all gridded to 250-m pixels, were the predictor variables. Because the basin-wide datasets were very large data mining software was used for modelling. Rule induction by data mining was also used to define the spatial domain of extrapolation for the extension of soil-landscape models from existing soil maps. Procedures to estimate the uncertainty associated with the predictions and quality of information for the new soil-landforms map of the MDB are described. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes the construction of Australia-wide soil property predictions from a compiled national soils point database. Those properties considered include pH, organic carbon, total phosphorus, total nitrogen, thickness. texture, and clay content. Many of these soil properties are used directly in environmental process modelling including global climate change models. Models are constructed at the 250-m resolution using decision trees. These relate the soil property to the environment through a suite of environmental predictors at the locations where measurements are observed. These models are then used to extend predictions to the continental extent by applying the rules derived to the exhaustively available environmental predictors. The methodology and performance is described in detail for pH and summarized for other properties. Environmental variables are found to be important predictors, even at the 250-m resolution at which they are available here as they can describe the broad changes in soil property.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Study Design: Data mining of single nucleotide polymorphisms (SNPs) in gene pathways related to spinal cord injury (SCI). Objectives: To identify gene polymorphisms putatively implicated with neuronal damage evolution pathways, potentially useful to SCI study. Setting: Departments of Psychiatry and Orthopedics, Faculdade de Medicina, Universidade de Sao Paulo, Brazil. Methods: Genes involved with processes related to SCI, such as apoptosis, inflammatory response, axonogenesis, peripheral nervous system development and axon ensheathment, were determined by evaluating the `Biological Process` annotation of Gene Ontology (GO). Each gene of these pathways was mapped using MapViewer, and gene coordinates were used to identify their polymorphisms in the SNP database. As a proof of concept, the frequency of subset of SNPs, located in four genes (ALOX12, APOE, BDNF and NINJ1) was evaluated in the DNA of a group of 28 SCI patients and 38 individuals with no SC lesions. Results: We could identify a total of 95 276 SNPs in a set of 588 genes associated with the selected GO terms, including 3912 nucleotide alterations located in coding regions of genes. The five non-synonymous SNPs genotyped in our small group of patients, showed a significant frequency, reinforcing their potential use for the investigation of SCI evolution. Conclusion: Despite the importance of SNPs in many aspects of gene expression and protein activity, these gene alterations have not been explored in SCI research. Here we describe a set of potentially useful SNPs, some of which could underlie the genetic mechanisms involved in the post trauma spinal cord damage.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A proportion of melanoma,prone individuals in both familial and non,familial contexts has been shown to carry inactivating mutations in either CDKN2A or, rarely, CDK4. CDKN2A is a complex locus that encodes two unrelated proteins from alternately spliced transcripts that are read in different frames. The alpha transcript (exons 1a, 2, and 3) produces the p16INK4A cyclin-dependent kinase inhibitor, while the beta transcript (exons 1beta and 2) is translated as p14ARF, a stabilizing factor of p53 levels through binding to MDM2. Mutations in exon 2 can impair both polypeptides and insertions and deletions in exons 1alpha, 1beta, and 2, which can theoretically generate p16INK4A,p14ARF fusion proteins. No online database currently takes into account all the consequences of these genotypes, a situation compounded by some problematic previous annotations of CDKN2A related sequences and descriptions of their mutations. As an initiative of the international Melanoma Genetics Consortium, we have therefore established a database of germline variants observed in all loci implicated in familial melanoma susceptibility. Such a comprehensive, publicly accessible database is an essential foundation for research on melanoma susceptibility and its clinical application. Our database serves two types of data as defined by HUGO. The core dataset includes the nucleotide variants on the genomic and transcript levels, amino acid variants, and citation. The ancillary dataset includes keyword description of events at the transcription and translation levels and epidemiological data. The application that handles users' queries was designed in the model,view. controller architecture and was implemented in Java. The object-relational database schema was deduced using functional dependency analysis. We hereby present our first functional prototype of eMelanoBase. The service is accessible via the URL www.wmi.usyd.e, du.au:8080/melanoma.html.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Benchmarking is an important tool to organisations to improve their productivity, product quality, process efficiency or services. From Benchmarking the organisations could compare their performance with competitors and identify their strengths and weaknesses. This study intends to do a benchmarking analysis on the main Iberian Sea ports with a special focus on their container terminals efficiency. To attain this, the DEA (data envelopment analysis) is used since it is considered by several researchers as the most effective method to quantify a set of key performance indicators. In order to reach a more reliable diagnosis tool the DEA is used together with the data mining in comparing the sea ports operational data of container terminals during 2007.Taking into account that sea ports are global logistics networks the performance evaluation is essential to an effective decision making in order to improve their efficiency and, therefore, their competitiveness.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

O trabalho que a seguir se apresenta tem como objectivo descrever a criação de um modelo que sirva de suporte a um sistema de apoio à decisão sobre o risco inerente à execução de projectos na área das Tecnologias de Informação (TI) recorrendo a técnicas de mineração de dados. Durante o ciclo de vida de um projecto, existem inúmeros factores que contribuem para o seu sucesso ou insucesso. A responsabilidade de monitorizar, antever e mitigar esses factores recai sobre o Gestor de Projecto. A gestão de projectos é uma tarefa difícil e dispendiosa, consome muitos recursos, depende de numerosas variáveis e, muitas vezes, até da própria experiência do Gestor de Projecto. Ao ser confrontado com as previsões de duração e de esforço para a execução de uma determinada tarefa, o Gestor de Projecto, exceptuando a sua percepção e intuição pessoal, não tem um modo objectivo de medir a plausibilidade dos valores que lhe são apresentados pelo eventual executor da tarefa. As referidas previsões são fundamentais para a organização, pois sobre elas são tomadas as decisões de planeamento global estratégico corporativo, de execução, de adiamento, de cancelamento, de adjudicação, de renegociação de âmbito, de adjudicação externa, entre outros. Esta propensão para o desvio, quando detectada numa fase inicial, pode ajudar a gerir melhor o risco associado à Gestão de Projectos. O sucesso de cada projecto terminado foi qualificado tendo em conta a ponderação de três factores: o desvio ao orçamentado, o desvio ao planeado e o desvio ao especificado. Analisando os projectos decorridos, e correlacionando alguns dos seus atributos com o seu grau de sucesso o modelo classifica, qualitativamente, um novo projecto quanto ao seu risco. Neste contexto o risco representa o grau de afastamento do projecto ao sucesso. Recorrendo a algoritmos de mineração de dados, tais como, árvores de classificação e redes neuronais, descreve-se o desenvolvimento de um modelo que suporta um sistema de apoio à decisão baseado na classificação de novos projectos. Os modelos são o resultado de um extensivo conjunto de testes de validação onde se procuram e refinam os indicadores que melhor caracterizam os atributos de um projecto e que mais influenciam o risco. Como suporte tecnológico para o desenvolvimento e teste foi utilizada a ferramenta Weka 3. Uma boa utilização do modelo proposto possibilitará a criação de planos de contingência mais detalhados e uma gestão mais próxima para projectos que apresentem uma maior propensão para o risco. Assim, o resultado final pretende constituir mais uma ferramenta à disposição do Gestor de Projecto.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.