919 resultados para genoma, genetica, dna, bioinformatica, mapreduce, snp, gwas, big data, sequenziamento, pipeline


Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

SOA (Service Oriented Architecture), workflow, the Semantic Web, and Grid computing are key enabling information technologies in the development of increasingly sophisticated e-Science infrastructures and application platforms. While the emergence of Cloud computing as a new computing paradigm has provided new directions and opportunities for e-Science infrastructure development, it also presents some challenges. Scientific research is increasingly finding that it is difficult to handle “big data” using traditional data processing techniques. Such challenges demonstrate the need for a comprehensive analysis on using the above mentioned informatics techniques to develop appropriate e-Science infrastructure and platforms in the context of Cloud computing. This survey paper describes recent research advances in applying informatics techniques to facilitate scientific research particularly from the Cloud computing perspective. Our particular contributions include identifying associated research challenges and opportunities, presenting lessons learned, and describing our future vision for applying Cloud computing to e-Science. We believe our research findings can help indicate the future trend of e-Science, and can inform funding and research directions in how to more appropriately employ computing technologies in scientific research. We point out the open research issues hoping to spark new development and innovation in the e-Science field.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automatic generation of classification rules has been an increasingly popular technique in commercial applications such as Big Data analytics, rule based expert systems and decision making systems. However, a principal problem that arises with most methods for generation of classification rules is the overfit-ting of training data. When Big Data is dealt with, this may result in the generation of a large number of complex rules. This may not only increase computational cost but also lower the accuracy in predicting further unseen instances. This has led to the necessity of developing pruning methods for the simplification of rules. In addition, classification rules are used further to make predictions after the completion of their generation. As efficiency is concerned, it is expected to find the first rule that fires as soon as possible by searching through a rule set. Thus a suit-able structure is required to represent the rule set effectively. In this chapter, the authors introduce a unified framework for construction of rule based classification systems consisting of three operations on Big Data: rule generation, rule simplification and rule representation. The authors also review some existing methods and techniques used for each of the three operations and highlight their limitations. They introduce some novel methods and techniques developed by them recently. These methods and techniques are also discussed in comparison to existing ones with respect to efficient processing of Big Data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The CHARMe project enables the annotation of climate data with key pieces of supporting information that we term “commentary”. Commentary reflects the experience that has built up in the user community, and can help new or less-expert users (such as consultants, SMEs, experts in other fields) to understand and interpret complex data. In the context of global climate services, the CHARMe system will record, retain and disseminate this commentary on climate datasets, and provide a means for feeding back this experience to the data providers. Based on novel linked data techniques and standards, the project has developed a core system, data model and suite of open-source tools to enable this information to be shared, discovered and exploited by the community.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present an overview of the MELODIES project, which is developing new data-intensive environmental services based on data from Earth Observation satellites, government databases, national and European agencies and more. We focus here on the capabilities and benefits of the project’s “technical platform”, which applies cloud computing and Linked Data technologies to enable the development of these services, providing flexibility and scalability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Environmental Data Abstraction Library provides a modular data management library for bringing new and diverse datatypes together for visualisation within numerous software packages, including the ncWMS viewing service, which already has very wide international uptake. The structure of EDAL is presented along with examples of its use to compare satellite, model and in situ data types within the same visualisation framework. We emphasize the value of this capability for cross calibration of datasets and evaluation of model products against observations, including preparation for data assimilation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study assesses Autism-Spectrum Quotient (AQ) scores in a ‘big data’ sample collected through the UK Channel 4 television website, following the broadcasting of a medical education program. We examine correlations between the AQ and age, sex, occupation, and UK geographic region in 450,394 individuals. We predicted that age and geography would not be correlated with AQ, whilst sex and occupation would have a correlation. Mean AQ for the total sample score was m = 19.83 (SD = 8.71), slightly higher than a previous systematic review of 6,900 individuals in a non-clinical sample (mean of means = 16.94) This likely reflects that this big-data sample includes individuals with autism who in the systematic review score much higher (mean of means = 35.19). As predicted, sex and occupation differences were observed: on average, males (m = 21.55, SD = 8.82) scored higher than females (m = 18.95; SD = 8.52), and individuals working in a STEM career (m = 21.92, SD = 8.92) scored higher than individuals non-STEM careers (m = 18.92, SD = 8.48). Also as predicted, age and geographic region were not meaningfully correlated with AQ. These results support previous findings relating to sex and STEM careers in the largest set of individuals for which AQ scores have been reported and suggest the AQ is a useful self-report measure of autistic traits

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This introduction to the Virtual Special Issue surveys the development of spatial housing economics from its roots in neo-classical theory, through more recent developments in social interactions modelling, and touching on the role of institutions, path dependence and economic history. The survey also points to some of the more promising future directions for the subject that are beginning to appear in the literature. The survey covers elements hedonic models, spatial econometrics, neighbourhood models, housing market areas, housing supply, models of segregation, migration, housing tenure, sub-national house price modelling including the so-called ripple effect, and agent-based models. Possible future directions are set in the context of a selection of recent papers that have appeared in Urban Studies. Nevertheless, there are still important gaps in the literature that merit further attention, arising at least partly from emerging policy problems. These include more research on housing and biodiversity, the relationship between housing and civil unrest, the effects of changing age distributions - notably housing for the elderly - and the impact of different international institutional structures. Methodologically, developments in Big Data provide an exciting framework for future work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

I consider the case for genuinely anonymous web searching. Big data seems to have it in for privacy. The story is well known, particularly since the dawn of the web. Vastly more personal information, monumental and quotidian, is gathered than in the pre-digital days. Once gathered it can be aggregated and analyzed to produce rich portraits, which in turn permit unnerving prediction of our future behavior. The new information can then be shared widely, limiting prospects and threatening autonomy. How should we respond? Following Nissenbaum (2011) and Brunton and Nissenbaum (2011 and 2013), I will argue that the proposed solutions—consent, anonymity as conventionally practiced, corporate best practices, and law—fail to protect us against routine surveillance of our online behavior. Brunton and Nissenbaum rightly maintain that, given the power imbalance between data holders and data subjects, obfuscation of one’s online activities is justified. Obfuscation works by generating “misleading, false, or ambiguous data with the intention of confusing an adversary or simply adding to the time or cost of separating good data from bad,” thus decreasing the value of the data collected (Brunton and Nissenbaum, 2011). The phenomenon is as old as the hills. Natural selection evidently blundered upon the tactic long ago. Take a savory butterfly whose markings mimic those of a toxic cousin. From the point of view of a would-be predator the data conveyed by the pattern is ambiguous. Is the bug lunch or potential last meal? In the light of the steep costs of a mistake, the savvy predator goes hungry. Online obfuscation works similarly, attempting for instance to disguise the surfer’s identity (Tor) or the nature of her queries (Howe and Nissenbaum 2009). Yet online obfuscation comes with significant social costs. First, it implies free riding. If I’ve installed an effective obfuscating program, I’m enjoying the benefits of an apparently free internet without paying the costs of surveillance, which are shifted entirely onto non-obfuscators. Second, it permits sketchy actors, from child pornographers to fraudsters, to operate with near impunity. Third, online merchants could plausibly claim that, when we shop online, surveillance is the price we pay for convenience. If we don’t like it, we should take our business to the local brick-and-mortar and pay with cash. Brunton and Nissenbaum have not fully addressed the last two costs. Nevertheless, I think the strict defender of online anonymity can meet these objections. Regarding the third, the future doesn’t bode well for offline shopping. Consider music and books. Intrepid shoppers can still find most of what they want in a book or record store. Soon, though, this will probably not be the case. And then there are those who, for perfectly good reasons, are sensitive about doing some of their shopping in person, perhaps because of their weight or sexual tastes. I argue that consumers should not have to pay the price of surveillance every time they want to buy that catchy new hit, that New York Times bestseller, or a sex toy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining is a relatively new field of research that its objective is to acquire knowledge from large amounts of data. In medical and health care areas, due to regulations and due to the availability of computers, a large amount of data is becoming available [27]. On the one hand, practitioners are expected to use all this data in their work but, at the same time, such a large amount of data cannot be processed by humans in a short time to make diagnosis, prognosis and treatment schedules. A major objective of this thesis is to evaluate data mining tools in medical and health care applications to develop a tool that can help make rather accurate decisions. In this thesis, the goal is finding a pattern among patients who got pneumonia by clustering of lab data values which have been recorded every day. By this pattern we can generalize it to the patients who did not have been diagnosed by this disease whose lab values shows the same trend as pneumonia patients does. There are 10 tables which have been extracted from a big data base of a hospital in Jena for my work .In ICU (intensive care unit), COPRA system which is a patient management system has been used. All the tables and data stored in German Language database.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Orange may be the new black, but as I have seen only five minutes of that show, I can’t really use it here. Besides, based on the five minutes I saw, I would assume it is a series written by males. Not since the Victoria’s Secret catalog have I seen so many women wearing fewer clothes, or engaging in so many unmentionable acts. I’ll stop there because my Victorianism is showing, I’m sure.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

O maior acesso dos brasileiros à internet e o aumento do volume de conteúdo disseminado pela web têm atraído atenção para as análises de big data. Mercado, imprensa e governos recorrem cada vez mais a técnicas de análise de rede para apoiar decisões. Mas essa prática embute riscos de manipulação pouco considerados. A DAPP/FGV — parceira do GLOBO no monitoramento de redes — tem desenvolvido mecanismos próprios de filtragem e identificou ao menos 25% de “lixo on-line” em pesquisas feitas nas duas últimas semanas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Na moderna Economia do Conhecimento, na Era do Big Data, entender corretamente o uso e a gestão da Tecnologia de Informação e Comunicação (TIC) tendo como base o campo acadêmico de estudos de Sistemas de Informação (SI), torna-se cada vez mais relevante e estratégico para as organizações que pretendem: permanecer em atividade, estar aptas para atender novas demandas (internas e externas) e enfrentar as complexas mudanças na competição de mercado. Esta pesquisa utiliza a teoria dos estágios de crescimento, fundamentada pelos estudos de Richard L. Nolan nos anos 70. A literatura acadêmica relacionada com modelos de estágios de crescimento e o contexto do campo de estudo de SI, fornecem as bases conceituais deste estudo. A pesquisa identifica um modelo com seus construtos relacionados aos estágios de crescimento das iniciativas da TIC/SI organizacional, partindo das variáveis de benchmark de segundo nível de Nolan, e propõe sua operacionalização com a criação e desenvolvimento de uma escala. De caráter exploratório e descritivo, a pesquisa traz contribuição teórica ao paradigma da teoria dos estágios de crescimento, adicionando um novo processo de crescimento em sua estrutura conceitual. Como resultado, é disponibilizado além de um instrumento de escala bilíngue (português e inglês), recomendações e regras para aplicação de um instrumento de pesquisa do tipo survey, na continuidade deste estudo. Como implicação geral desta pesquisa, é esperado que seu uso e aplicação ao mensurar a avaliação do nível de estágio da TIC/SI em organizações, possam auxiliar dois perfis de indivíduos: acadêmicos que estudam essa temática, assim como, profissionais que buscam respostas de suas ações práticas nas organizações onde trabalham.