842 resultados para Data Mining, Yield Improvement, Self Organising Map, Clustering Quality


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aquesta exposició vol presentar breument el ventall d'eines disponibles, la terminologia utilitzada i, en general, el marc metodològic de l'estadística exploratoria i de l'analisi de dades, el paradigma de la disciplina.En el decurs dels darrers anys, la disciplina no ha estat pas capgirada, però de tota manera sí que cal una actualització permanent.S'han forjat i provat algunes eines gairebé només esbossades, han aparegut nous dominis d'aplicació. Cal precisar la relació amb els competidors i dinamics veïns (intel·ligencia artificial, xarxes neurals, Data Mining). La perspectiva que presento dels mètodes d'anàlisi de dades emana evidentment d'un punt de vista particular; altres punts de vista poden ser igualment vàlids

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this work was to evaluate the effect of pond management on fish feed, growth, yield, survival, and water and effluent quality, during tambaqui (Colossoma macropomum) juvenile production. Fish were distributed in nine 600 m² earthen ponds, at a density of 8 fish per m²; the rearing period was 60 days. Three different pond management were applied: limed and fertilized (LimFer), limed (Lim), and natural (Nat). Fish were fed with a commercial ration containing 34% crude protein three times daily. There were no significant differences in fish growth or yield. Three main items found in tambaqui stomach were insect, zooplankton and ration, without a significant difference among treatments in proportion. Alkalinity, hardness, and CO2 were greater in LimFer and Lim ponds. Chlorophyll a, transparency, ammonia, nitrite, temperature, and dissolved oxygen of pond water were not significantly different among treatments. Biochemical oxygen demand, total phosphorus, orthophosphate, ammonia, and nitrite were significantly greater in effluents from LimFer ponds. Pond fertilization should be avoided, because growth and yield were similar among the three pond management systems tested; besides, it produces a more impacting effluent.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The European Space Agency's Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalogue will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments. In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigmbut without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission. In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The European Space Agency's Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalogue will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments. In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigmbut without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission. In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, studies into the reasons for dropping out of higher education (including online education) have been undertaken with greater regularity, parallel to the rise in the relative weight of this type of education, compared with brick-and-mortar education. However, the work invested in characterising the students who drop out of education, compared with those who do not, appears not to have had the same relevance as that invested in the analysis of the causes. The definition of dropping out is very sensitive to the context. In this article, we reach a purely empirical definition of student dropping out, based on the probability of not continuing a specific academic programme following several consecutive semesters of "theoretical break". Dropping out should be properly defined before analysing its causes, as well as comparing the drop-out rates between the different online programmes, or between online and on-campus ones. Our results show that there are significant differences among programmes, depending on their theoretical extension, but not their domain of knowledge.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This master's thesis coversthe concepts of knowledge discovery, data mining and technology forecasting methods in telecommunications. It covers the various aspects of knowledge discoveryin data bases and discusses in detail the methods of data mining and technologyforecasting methods that are used in telecommunications. Main concern in the overall process of this thesis is to emphasize the methods that are being used in technology forecasting for telecommunications and data mining. It tries to answer to some extent to the question of do forecasts create a future? It also describes few difficulties that arise in technology forecasting. This thesis was done as part of my master's studies in Lappeenranta University of Technology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Selective publication of studies, which is commonly called publication bias, is widely recognized. Over the years a new nomenclature for other types of bias related to non-publication or distortion related to the dissemination of research findings has been developed. However, several of these different biases are often still summarized by the term 'publication bias'. METHODS/DESIGN: As part of the OPEN Project (To Overcome failure to Publish nEgative fiNdings) we will conduct a systematic review with the following objectives:- To systematically review highly cited articles that focus on non-publication of studies and to present the various definitions of biases related to the dissemination of research findings contained in the articles identified.- To develop and discuss a new framework on nomenclature of various aspects of distortion in the dissemination process that leads to public availability of research findings in an international group of experts in the context of the OPEN Project.We will systematically search Web of Knowledge for highly cited articles that provide a definition of biases related to the dissemination of research findings. A specifically designed data extraction form will be developed and pilot-tested. Working in teams of two, we will independently extract relevant information from each eligible article.For the development of a new framework we will construct an initial table listing different levels and different hazards en route to making research findings public. An international group of experts will iteratively review the table and reflect on its content until no new insights emerge and consensus has been reached. DISCUSSION: Results are expected to be publicly available in mid-2013. This systematic review together with the results of other systematic reviews of the OPEN project will serve as a basis for the development of future policies and guidelines regarding the assessment and prevention of publication bias.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

DDM is a framework that combines intelligent agents and artificial intelligence traditional algorithms such as classifiers. The central idea of this project is to create a multi-agent system that allows to compare different views into a single one.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Current advances in genomics, proteomics and other areas of molecular biology make the identification and reconstruction of novel pathways an emerging area of great interest. One such class of pathways is involved in the biogenesis of Iron-Sulfur Clusters (ISC). Results: Our goal is the development of a new approach based on the use and combination of mathematical, theoretical and computational methods to identify the topology of a target network. In this approach, mathematical models play a central role for the evaluation of the alternative network structures that arise from literature data-mining, phylogenetic profiling, structural methods, and human curation. As a test case, we reconstruct the topology of the reaction and regulatory network for the mitochondrial ISC biogenesis pathway in S. cerevisiae. Predictions regarding how proteins act in ISC biogenesis are validated by comparison with published experimental results. For example, the predicted role of Arh1 and Yah1 and some of the interactions we predict for Grx5 both matches experimental evidence. A putative role for frataxin in directly regulating mitochondrial iron import is discarded from our analysis, which agrees with also published experimental results. Additionally, we propose a number of experiments for testing other predictions and further improve the identification of the network structure. Conclusion: We propose and apply an iterative in silico procedure for predictive reconstruction of the network topology of metabolic pathways. The procedure combines structural bioinformatics tools and mathematical modeling techniques that allow the reconstruction of biochemical networks. Using the Iron Sulfur cluster biogenesis in S. cerevisiae as a test case we indicate how this procedure can be used to analyze and validate the network model against experimental results. Critical evaluation of the obtained results through this procedure allows devising new wet lab experiments to confirm its predictions or provide alternative explanations for further improving the models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Vuosi vuodelta kasvava tietokoneiden prosessointikyky on mahdollistanut harmaataso- ja RGB-värikuvia tarkempien spektrikuvien käsittelyn järjellisessä ajassa ilman suuria kustannuksia. Ongelmana on kuitenkin, ettei talletus- ja tiedonsiirtomedia ole kehittynyt prosessointikyvyn vauhdissa. Ratkaisu tähän ongelmaan on spektrikuvien tiivistäminen talletuksen ja tiedonsiirron ajaksi. Tässä työssä esitellään menetelmä, jossa spektrikuva tiivistetään kahdessa vaiheessa: ensin ryhmittelemällä itseorganisoituvan kartan (SOM) avulla ja toisessa vaiheessa jatketaan tiivistämistä perinteisin menetelmin. Saadut tiivistyssuhteet ovat merkittäviä vääristymän pysyessä siedettävänä. Työ on tehty Lappeenrannan teknillisen korkeakoulun Tietotekniikan osaston Tietojenkäsittelytekniikan tutkimuslaboratoriossa osana laajempaa kuvantiivistyksen tutkimushanketta.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Information about the composition of regulatory regions is of great value for designing experiments to functionally characterize gene expression. The multiplicity of available applications to predict transcription factor binding sites in a particular locus contrasts with the substantial computational expertise that is demanded to manipulate them, which may constitute a potential barrier for the experimental community. Results: CBS (Conserved regulatory Binding Sites, http://compfly.bio.ub.es/CBS) is a public platform of evolutionarily conserved binding sites and enhancers predicted in multiple Drosophila genomes that is furnished with published chromatin signatures associated to transcriptionally active regions and other experimental sources of information. The rapid access to this novel body of knowledge through a user-friendly web interface enables non-expert users to identify the binding sequences available for any particular gene, transcription factor, or genome region. Conclusions: The CBS platform is a powerful resource that provides tools for data mining individual sequences and groups of co-expressed genes with epigenomics information to conduct regulatory screenings in Drosophila.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study compares the impact of quality management tools on the performance of organisations utilising the ISO 9001:2000 standard as a basis for a quality-management system band those utilising the EFQM model for this purpose. A survey is conducted among 107 experienced and independent quality-management assessors. The study finds that organisations with qualitymanagement systems based on the ISO 9001:2000 standard tend to use general-purpose qualitative tools, and that these do have a relatively positive impact on their general performance. In contrast, organisations adopting the EFQM model tend to use more specialised quantitative tools, which produce significant improvements in specific aspects of their performance. The findings of the study will enable organisations to choose the most effective quality-improvement tools for their particular quality strategy

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Open educational resources (OER) promise increased access, participation, quality, and relevance, in addition to cost reduction. These seemingly fantastic promises are based on the supposition that educators and learners will discover existing resources, improve them, and share the results, resulting in a virtuous cycle of improvement and re-use. By anecdotal metrics, existing web scale search is not working for OER. This situation impairs the cycle underlying the promise of OER, endangering long term growth and sustainability. While the scope of the problem is vast, targeted improvements in areas of curation, indexing, and data exchange can improve the situation, and create opportunities for further scale. I explore the way the system is currently inadequate, discuss areas for targeted improvement, and describe a prototype system built to test these ideas. I conclude with suggestions for further exploration and development.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Integrated in a wide research assessing destabilizing and triggering factors to model cliff dynamic along the Dieppe's shoreline in High Normandy, this study aims at testing boat-based mobile LiDAR capabilities by scanning 3D point clouds of the unstable coastal cliffs. Two acquisition campaigns were performed in September 2012 and September 2013, scanning (1) a 30-km-long shoreline and (2) the same test cliffs in different environmental conditions and device settings. The potentials of collected data for 3D modelling, change detection and landslide monitoring were afterward assessed. By scanning during favourable meteorological and marine conditions and close to the coast, mobile LiDAR devices are able to quickly scan a long shoreline with median point spacing up to 10cm. The acquired data are then sufficiently detailed to map geomorphological features smaller than 0.5m2. Furthermore, our capability to detect rockfalls and erosion deposits (>m3) is confirmed, since using the classical approach of computing differences between sequential acquisitions reveals many cliff collapses between Pourville and Quiberville and only sparse changes between Dieppe and Belleville-sur-Mer. These different change rates result from different rockfall susceptibilities. Finally, we also confirmed the capability of the boat-based mobile LiDAR technique to monitor single large changes, characterizing the Dieppe landslide geometry with two main active scarps, retrogression up to 40m and about 100,000m3 of eroded materials.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The extension of traditional data mining methods to time series has been effectively applied to a wide range of domains such as finance, econometrics, biology, security, and medicine. Many existing mining methods deal with the task of change points detection, but very few provide a flexible approach. Querying specific change points with linguistic variables is particularly useful in crime analysis, where intuitive, understandable, and appropriate detection of changes can significantly improve the allocation of resources for timely and concise operations. In this paper, we propose an on-line method for detecting and querying change points in crime-related time series with the use of a meaningful representation and a fuzzy inference system. Change points detection is based on a shape space representation, and linguistic terms describing geometric properties of the change points are used to express queries, offering the advantage of intuitiveness and flexibility. An empirical evaluation is first conducted on a crime data set to confirm the validity of the proposed method and then on a financial data set to test its general applicability. A comparison to a similar change-point detection algorithm and a sensitivity analysis are also conducted. Results show that the method is able to accurately detect change points at very low computational costs. More broadly, the detection of specific change points within time series of virtually any domain is made more intuitive and more understandable, even for experts not related to data mining.