775 resultados para mining data streams
Resumo:
As plataformas de e-Learning são cada vez mais utilizadas na educação à distância, facto que se encontra diretamente relacionado com a possibilidade de proporcionarem aos seus alunos a valência de poderem assistir a cursos em qualquer lugar. Dentro do âmbito das plataformas de e-Learning encontra-se um grupo especialmente interessante: as plataformas adaptativas, que tendem a substituir o professor (presencial) através de interatividade, variabilidade de conteúdos, automatização e capacidade para resolução de problemas e simulação de comportamentos educacionais. O projeto ADAPT (plataforma adaptativa de e-Learning) consiste na criação de uma destas plataformas, implementando tutoria inteligente, resolução de problemas com base em experiências passadas, algoritmos genéticos e link-mining. É na área de link-mining que surge o desenvolvimento desta dissertação que documenta o desenvolvimento de quatro módulos distintos: O primeiro módulo consiste num motor de busca para sugestão de conteúdos alternativos; o segundo módulo consiste na identificação de mudanças de estilo de aprendizagem; o terceiro módulo consiste numa plataforma de análise de dados que implementa várias técnicas de data mining e estatística para fornecer aos professores/tutores informações importantes que não seriam visíveis sem recurso a este tipo de técnicas; por fim, o último módulo consiste num sistema de recomendações que sugere aos alunos os artigos mais adequados com base nas consultas de alunos com perfis semelhantes. Esta tese documenta o desenvolvimento dos vários protótipos para cada um destes módulos. Os testes efetuados para cada módulo mostram que as metodologias utilizadas são válidas e viáveis.
Resumo:
The apposition compound eyes of gonodactyloid stomatopods are divided into a ventral and a dorsal hemisphere by six equatorial rows of enlarged ommatidia, the mid-band (MB). Whereas the hemispheres are specialized for spatial vision, the MB consists of four dorsal rows of ommatidia specialized for colour vision and two ventral rows specialized for polarization vision. The eight retinula cell axons (RCAs) from each ommatidium project retinotopically onto one corresponding lamina cartridge, so that the three retinal data streams (spatial, colour and polarization) remain anatomically separated. This study investigates whether the retinal specializations are reflected in differences in the RCA arrangement within the corresponding lamina cartridges. We have found that, in all three eye regions, the seven short visual fibres (svfs) formed by retinula cells 1-7 (R1-R7) terminate at two distinct lamina levels, geometrically separating the terminals of photoreceptors sensitive to either orthogonal e-vector directions or different wavelengths of light. This arrangement is required for the establishment of spectral and polarization opponency mechanisms. The long visual fibres (lvfs) of the eighth retinula cells (R8) pass through the lamina and project retinotopically to the distal medulla externa. Differences between the three eye regions exist in the packing of svf terminals and in the branching patterns of the lvfs within the lamina. We hypothesize that the R8 cells of MB rows 1-4 are incorporated into the colour vision system formed by R1-R7, whereas the R8 cells of MB rows 5 and 6 form a separate neural channel from R1 to R7 for polarization processing.
Resumo:
Sharing data among organizations often leads to mutual benefit. Recent technology in data mining has enabled efficient extraction of knowledge from large databases. This, however, increases risks of disclosing the sensitive knowledge when the database is released to other parties. To address this privacy issue, one may sanitize the original database so that the sensitive knowledge is hidden. The challenge is to minimize the side effect on the quality of the sanitized database so that nonsensitive knowledge can still be mined. In this paper, we study such a problem in the context of hiding sensitive frequent itemsets by judiciously modifying the transactions in the database. To preserve the non-sensitive frequent itemsets, we propose a border-based approach to efficiently evaluate the impact of any modification to the database during the hiding process. The quality of database can be well maintained by greedily selecting the modifications with minimal side effect. Experiments results are also reported to show the effectiveness of the proposed approach. © 2005 IEEE
Resumo:
A network concept is introduced that exploits transparent optical grooming of traffic between an access network and a metro core ring network. This network is enabled by an optical router that allows bufferless aggregation of metro network traffic into higher-capacity data streams for core network transmission. A key functionality of the router is WDM to time-division multiplexing (TDM) transmultiplexing.
Resumo:
A novel simple all-optical nonlinear pulse processing technique using loop mirror intensity filtering and nonlinear broadening in normal dispersion fiber is described. The pulse processor offers reamplification and cleaning up of the optical signals and phase margin improvement. The efficiency of the technique is demonstrated by application to 40-Gb/s return-to-zero optical data streams.
Resumo:
A novel all-optical regeneration technique using loop-mirror intensity-filtering and nonlinear broadening in normal-dispersion fibre is described. The device offers 2R-regeneration function and phase margin improvement. The technique is applied to 40Gbit/s return-to-zero optical data streams.
Resumo:
A novel all-optical regeneration technique using loop-mirror intensity-filtering and nonlinear broadening in normal-dispersion fibre is described. The device offers 2R-regeneration function and phase margin improvement. The technique is applied to 40Gbit/s return-to-zero optical data streams.
Resumo:
A novel simple all-optical nonlinear pulse processing technique using loop mirror intensity filtering and nonlinear broadening in normal dispersion fiber is described. The pulse processor offers reamplification and cleaning up of the optical signals and phase margin improvement. The efficiency of the technique is demonstrated by application to 40-Gb/s return-to-zero optical data streams. © 2004 IEEE.
Resumo:
The problem of multi-agent routing in static telecommunication networks with fixed configuration is considered. The problem is formulated in two ways: for centralized routing schema with the coordinator-agent (global routing) and for distributed routing schema with independent agents (local routing). For both schemas appropriate Hopfield neural networks (HNN) are constructed.
Resumo:
Oxidative post-translational modifications (oxPTMs) can alter the function of proteins, and are important in the redox regulation of cell behaviour. The most informative technique to detect and locate oxPTMs within proteins is mass spectrometry (MS). However, proteomic MS data are usually searched against theoretical databases using statistical search engines, and the occurrence of unspecified or multiple modifications, or other unexpected features, can lead to failure to detect the modifications and erroneous identifications of oxPTMs. We have developed a new approach for mining data from accurate mass instruments that allows multiple modifications to be examined. Accurate mass extracted ion chromatograms (XIC) for specific reporter ions from peptides containing oxPTMs were generated from standard LC-MSMS data acquired on a rapid-scanning high-resolution mass spectrometer (ABSciex 5600 Triple TOF). The method was tested using proteins from human plasma or isolated LDL. A variety of modifications including chlorotyrosine, nitrotyrosine, kynurenine, oxidation of lysine, and oxidized phospholipid adducts were detected. For example, the use of a reporter ion at 184.074 Da/e, corresponding to phosphocholine, was used to identify for the first time intact oxidized phosphatidylcholine adducts on LDL. In all cases the modifications were confirmed by manual sequencing. ApoB-100 containing oxidized lipid adducts was detected even in healthy human samples, as well as LDL from patients with chronic kidney disease. The accurate mass XIC method gave a lower false positive rate than normal database searching using statistical search engines, and identified more oxidatively modified peptides. A major advantage was that additional modifications could be searched after data collection, and multiple modifications on a single peptide identified. The oxPTMs present on albumin and ApoB-100 have potential as indicators of oxidative damage in ageing or inflammatory diseases.
Resumo:
Large read-only or read-write transactions with a large read set and a small write set constitute an important class of transactions used in such applications as data mining, data warehousing, statistical applications, and report generators. Such transactions are best supported with optimistic concurrency, because locking of large amounts of data for extended periods of time is not an acceptable solution. The abort rate in regular optimistic concurrency algorithms increases exponentially with the size of the transaction. The algorithm proposed in this dissertation solves this problem by using a new transaction scheduling technique that allows a large transaction to commit safely with significantly greater probability that can exceed several orders of magnitude versus regular optimistic concurrency algorithms. A performance simulation study and a formal proof of serializability and external consistency of the proposed algorithm are also presented.^ This dissertation also proposes a new query optimization technique (lazy queries). Lazy Queries is an adaptive query execution scheme which optimizes itself as the query runs. Lazy queries can be used to find an intersection of sub-queries in a very efficient way, which does not require full execution of large sub-queries nor does it require any statistical knowledge about the data.^ An efficient optimistic concurrency control algorithm used in a massively parallel B-tree with variable-length keys is introduced. B-trees with variable-length keys can be effectively used in a variety of database types. In particular, we show how such a B-tree was used in our implementation of a semantic object-oriented DBMS. The concurrency control algorithm uses semantically safe optimistic virtual "locks" that achieve very fine granularity in conflict detection. This algorithm ensures serializability and external consistency by using logical clocks and backward validation of transactional queries. A formal proof of correctness of the proposed algorithm is also presented. ^
Resumo:
In recent years, there has been an enormous growth of location-aware devices, such as GPS embedded cell phones, mobile sensors and radio-frequency identification tags. The age of combining sensing, processing and communication in one device, gives rise to a vast number of applications leading to endless possibilities and a realization of mobile Wireless Sensor Network (mWSN) applications. As computing, sensing and communication become more ubiquitous, trajectory privacy becomes a critical piece of information and an important factor for commercial success. While on the move, sensor nodes continuously transmit data streams of sensed values and spatiotemporal information, known as ``trajectory information". If adversaries can intercept this information, they can monitor the trajectory path and capture the location of the source node. ^ This research stems from the recognition that the wide applicability of mWSNs will remain elusive unless a trajectory privacy preservation mechanism is developed. The outcome seeks to lay a firm foundation in the field of trajectory privacy preservation in mWSNs against external and internal trajectory privacy attacks. First, to prevent external attacks, we particularly investigated a context-based trajectory privacy-aware routing protocol to prevent the eavesdropping attack. Traditional shortest-path oriented routing algorithms give adversaries the possibility to locate the target node in a certain area. We designed the novel privacy-aware routing phase and utilized the trajectory dissimilarity between mobile nodes to mislead adversaries about the location where the message started its journey. Second, to detect internal attacks, we developed a software-based attestation solution to detect compromised nodes. We created the dynamic attestation node chain among neighboring nodes to examine the memory checksum of suspicious nodes. The computation time for memory traversal had been improved compared to the previous work. Finally, we revisited the trust issue in trajectory privacy preservation mechanism designs. We used Bayesian game theory to model and analyze cooperative, selfish and malicious nodes' behaviors in trajectory privacy preservation activities.^
Resumo:
The software product line engineering brings advantages when compared with the traditional software development regarding the mass customization of the system components. However, there are scenarios that to maintain separated clones of a software system seems to be an easier and more flexible approach to manage their variabilities of a software product line. This dissertation evaluates qualitatively an approach that aims to support the reconciliation of functionalities between cloned systems. The analyzed approach is based on mining data about the issues and source code of evolved cloned web systems. The next step is to process the merge conflicts collected by the approach and not indicated by traditional control version systems to identify potential integration problems from the cloned software systems. The results of the study show the feasibility of the approach to perform a systematic characterization and analysis of merge conflicts for large-scale web-based systems.
Resumo:
With Tweet volumes reaching 500 million a day, sampling is inevitable for any application using Twitter data. Realizing this, data providers such as Twitter, Gnip and Boardreader license sampled data streams priced in accordance with the sample size. Big Data applications working with sampled data would be interested in working with a large enough sample that is representative of the universal dataset. Previous work focusing on the representativeness issue has considered ensuring the global occurrence rates of key terms, be reliably estimated from the sample. Present technology allows sample size estimation in accordance with probabilistic bounds on occurrence rates for the case of uniform random sampling. In this paper, we consider the problem of further improving sample size estimates by leveraging stratification in Twitter data. We analyze our estimates through an extensive study using simulations and real-world data, establishing the superiority of our method over uniform random sampling. Our work provides the technical know-how for data providers to expand their portfolio to include stratified sampled datasets, whereas applications are benefited by being able to monitor more topics/events at the same data and computing cost.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08