858 resultados para COMPUTER SCIENCE, THEORY
Resumo:
We review some issues related to the implications of different missing data mechanisms on statistical inference for contingency tables and consider simulation studies to compare the results obtained under such models to those where the units with missing data are disregarded. We confirm that although, in general, analyses under the correct missing at random and missing completely at random models are more efficient even for small sample sizes, there are exceptions where they may not improve the results obtained by ignoring the partially classified data. We show that under the missing not at random (MNAR) model, estimates on the boundary of the parameter space as well as lack of identifiability of the parameters of saturated models may be associated with undesirable asymptotic properties of maximum likelihood estimators and likelihood ratio tests; even in standard cases the bias of the estimators may be low only for very large samples. We also show that the probability of a boundary solution obtained under the correct MNAR model may be large even for large samples and that, consequently, we may not always conclude that a MNAR model is misspecified because the estimate is on the boundary of the parameter space.
Resumo:
The design of a network is a solution to several engineering and science problems. Several network design problems are known to be NP-hard, and population-based metaheuristics like evolutionary algorithms (EAs) have been largely investigated for such problems. Such optimization methods simultaneously generate a large number of potential solutions to investigate the search space in breadth and, consequently, to avoid local optima. Obtaining a potential solution usually involves the construction and maintenance of several spanning trees, or more generally, spanning forests. To efficiently explore the search space, special data structures have been developed to provide operations that manipulate a set of spanning trees (population). For a tree with n nodes, the most efficient data structures available in the literature require time O(n) to generate a new spanning tree that modifies an existing one and to store the new solution. We propose a new data structure, called node-depth-degree representation (NDDR), and we demonstrate that using this encoding, generating a new spanning forest requires average time O(root n). Experiments with an EA based on NDDR applied to large-scale instances of the degree-constrained minimum spanning tree problem have shown that the implementation adds small constants and lower order terms to the theoretical bound.
Resumo:
We review recent visualization techniques aimed at supporting tasks that require the analysis of text documents, from approaches targeted at visually summarizing the relevant content of a single document to those aimed at assisting exploratory investigation of whole collections of documents.Techniques are organized considering their target input materialeither single texts or collections of textsand their focus, which may be at displaying content, emphasizing relevant relationships, highlighting the temporal evolution of a document or collection, or helping users to handle results from a query posed to a search engine.We describe the approaches adopted by distinct techniques and briefly review the strategies they employ to obtain meaningful text models, discuss how they extract the information required to produce representative visualizations, the tasks they intend to support and the interaction issues involved, and strengths and limitations. Finally, we show a summary of techniques, highlighting their goals and distinguishing characteristics. We also briefly discuss some open problems and research directions in the fields of visual text mining and text analytics.
Resumo:
The Distributed Software Development (DSD) is a development strategy that meets the globalization needs concerned with the increase productivity and cost reduction. However, the temporal distance, geographical dispersion and the socio-cultural differences, increased some challenges and, especially, added new requirements related with the communication, coordination and control of projects. Among these new demands there is the necessity of a software process that provides adequate support to the distributed software development. This paper presents an integrated approach of software development and test that considers distributed teams peculiarities. The approach purpose is to offer support to DSD, providing a better project visibility, improving the communication between the development and test teams, minimizing the ambiguity and difficulty to understand the artifacts and activities. This integrated approach was conceived based on four pillars: (i) to identify the DSD peculiarities concerned with development and test processes, (ii) to define the necessary elements to compose the integrated approach of development and test to support the distributed teams, (iii) to describe and specify the workflows, artifacts, and roles of the approach, and (iv) to represent appropriately the approach to enable the effective communication and understanding of it.
Resumo:
Let k and l be positive integers. With a graph G, we associate the quantity c(k,l)(G), the number of k-colourings of the edge set of G with no monochromatic matching of size l. Consider the function c(k,l) : N --> N given by c(k,l)(n) = max {c(k,l)(G): vertical bar V(G)vertical bar = n}, the maximum of c(k,l)(G) over all graphs G on n vertices. In this paper, we determine c(k,l)(n) and the corresponding extremal graphs for all large n and all fixed values of k and l.
Resumo:
In the past decades, all of the efforts at quantifying systems complexity with a general tool has usually relied on using Shannon's classical information framework to address the disorder of the system through the Boltzmann-Gibbs-Shannon entropy, or one of its extensions. However, in recent years, there were some attempts to tackle the quantification of algorithmic complexities in quantum systems based on the Kolmogorov algorithmic complexity, obtaining some discrepant results against the classical approach. Therefore, an approach to the complexity measure is proposed here, using the quantum information formalism, taking advantage of the generality of the classical-based complexities, and being capable of expressing these systems' complexity on other framework than its algorithmic counterparts. To do so, the Shiner-Davison-Landsberg (SDL) complexity framework is considered jointly with linear entropy for the density operators representing the analyzed systems formalism along with the tangle for the entanglement measure. The proposed measure is then applied in a family of maximally entangled mixed state.
Resumo:
In this work, we present an implementation of quantum logic gates and algorithms in a three effective qubits system, represented by a (I = 7/2) NMR quadrupolar nuclei. To implement these protocols we have used the strong modulating pulses (SMP) and the various stages of each implementation were verified by quantum state tomography (QST). The results for the computational base states, Toffolli logic gates, and Deutsch-Jozsa and Grover algorithms are presented here. Also, we discuss the difficulties and advantages of implementing such protocols using the SMP technique in quadrupolar systems.
Resumo:
The ability to transmit and amplify weak signals is fundamental to signal processing of artificial devices in engineering. Using a multilayer feedforward network of coupled double-well oscillators as well as Fitzhugh-Nagumo oscillators, we here investigate the conditions under which a weak signal received by the first layer can be transmitted through the network with or without amplitude attenuation. We find that the coupling strength and the nodes' states of the first layer act as two-state switches, which determine whether the transmission is significantly enhanced or exponentially decreased. We hope this finding is useful for designing artificial signal amplifiers.
Resumo:
Breakthrough advances in microprocessor technology and efficient power management have altered the course of development of processors with the emergence of multi-core processor technology, in order to bring higher level of processing. The utilization of many-core technology has boosted computing power provided by cluster of workstations or SMPs, providing large computational power at an affordable cost using solely commodity components. Different implementations of message-passing libraries and system softwares (including Operating Systems) are installed in such cluster and multi-cluster computing systems. In order to guarantee correct execution of message-passing parallel applications in a computing environment other than that originally the parallel application was developed, review of the application code is needed. In this paper, a hybrid communication interfacing strategy is proposed, to execute a parallel application in a group of computing nodes belonging to different clusters or multi-clusters (computing systems may be running different operating systems and MPI implementations), interconnected with public or private IP addresses, and responding interchangeably to user execution requests. Experimental results demonstrate the feasibility of this proposed strategy and its effectiveness, through the execution of benchmarking parallel applications.
Resumo:
Traditional supervised data classification considers only physical features (e. g., distance or similarity) of the input data. Here, this type of learning is called low level classification. On the other hand, the human (animal) brain performs both low and high orders of learning and it has facility in identifying patterns according to the semantic meaning of the input data. Data classification that considers not only physical attributes but also the pattern formation is, here, referred to as high level classification. In this paper, we propose a hybrid classification technique that combines both types of learning. The low level term can be implemented by any classification technique, while the high level term is realized by the extraction of features of the underlying network constructed from the input data. Thus, the former classifies the test instances by their physical features or class topologies, while the latter measures the compliance of the test instances to the pattern formation of the data. Our study shows that the proposed technique not only can realize classification according to the pattern formation, but also is able to improve the performance of traditional classification techniques. Furthermore, as the class configuration's complexity increases, such as the mixture among different classes, a larger portion of the high level term is required to get correct classification. This feature confirms that the high level classification has a special importance in complex situations of classification. Finally, we show how the proposed technique can be employed in a real-world application, where it is capable of identifying variations and distortions of handwritten digit images. As a result, it supplies an improvement in the overall pattern recognition rate.
Resumo:
We propose simple heuristics for the assembly line worker assignment and balancing problem. This problem typically occurs in assembly lines in sheltered work centers for the disabled. Different from the well-known simple assembly line balancing problem, the task execution times vary according to the assigned worker. We develop a constructive heuristic framework based on task and worker priority rules defining the order in which the tasks and workers should be assigned to the workstations. We present a number of such rules and compare their performance across three possible uses: as a stand-alone method, as an initial solution generator for meta-heuristics, and as a decoder for a hybrid genetic algorithm. Our results show that the heuristics are fast, they obtain good results as a stand-alone method and are efficient when used as a initial solution generator or as a solution decoder within more elaborate approaches.
Resumo:
Competitive learning is an important machine learning approach which is widely employed in artificial neural networks. In this paper, we present a rigorous definition of a new type of competitive learning scheme realized on large-scale networks. The model consists of several particles walking within the network and competing with each other to occupy as many nodes as possible, while attempting to reject intruder particles. The particle's walking rule is composed of a stochastic combination of random and preferential movements. The model has been applied to solve community detection and data clustering problems. Computer simulations reveal that the proposed technique presents high precision of community and cluster detections, as well as low computational complexity. Moreover, we have developed an efficient method for estimating the most likely number of clusters by using an evaluator index that monitors the information generated by the competition process itself. We hope this paper will provide an alternative way to the study of competitive learning.
Resumo:
Semisupervised learning is a machine learning approach that is able to employ both labeled and unlabeled samples in the training process. In this paper, we propose a semisupervised data classification model based on a combined random-preferential walk of particles in a network (graph) constructed from the input dataset. The particles of the same class cooperate among themselves, while the particles of different classes compete with each other to propagate class labels to the whole network. A rigorous model definition is provided via a nonlinear stochastic dynamical system and a mathematical analysis of its behavior is carried out. A numerical validation presented in this paper confirms the theoretical predictions. An interesting feature brought by the competitive-cooperative mechanism is that the proposed model can achieve good classification rates while exhibiting low computational complexity order in comparison to other network-based semisupervised algorithms. Computer simulations conducted on synthetic and real-world datasets reveal the effectiveness of the model.
Resumo:
In this paper we discuss the problem of how to discriminate moments of interest on videos or live broadcast shows. The primary contribution is a system which allows users to personalize their programs with previously created media stickers-pieces of content that may be temporarily attached to the original video. We present the system's architecture and implementation, which offer users operators to transparently annotate videos while watching them. We offered a soccer fan the opportunity to add stickers to the video while watching a live match: the user reported both enjoying and being comfortable using the stickers during the match-relevant results even though the experience was not fully representative.
Resumo:
We present a generalized test case generation method, called the G method. Although inspired by the W method, the G method, in contrast, allows for test case suite generation even in the absence of characterization sets for the specification models. Instead, the G method relies on knowledge about the index of certain equivalences induced at the implementation models. We show that the W method can be derived from the G method as a particular case. Moreover, we discuss some naturally occurring infinite classes of FSM models over which the G method generates test suites that are exponentially more compact than those produced by the W method.