996 resultados para data redundancy


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dispersing a data object into a set of data shares is an elemental stage in distributed communication and storage systems. In comparison to data replication, data dispersal with redundancy saves space and bandwidth. Moreover, dispersing a data object to distinct communication links or storage sites limits adversarial access to whole data and tolerates loss of a part of data shares. Existing data dispersal schemes have been proposed mostly based on various mathematical transformations on the data which induce high computation overhead. This paper presents a novel data dispersal scheme where each part of a data object is replicated, without encoding, into a subset of data shares according to combinatorial design theory. Particularly, data parts are mapped to points and data shares are mapped to lines of a projective plane. Data parts are then distributed to data shares using the point and line incidence relations in the plane so that certain subsets of data shares collectively possess all data parts. The presented scheme incorporates combinatorial design theory with inseparability transformation to achieve secure data dispersal at reduced computation, communication and storage costs. Rigorous formal analysis and experimental study demonstrate significant cost-benefits of the presented scheme in comparison to existing methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most of the manual labor needed to create the geometric building information model (BIM) of an existing facility is spent converting raw point cloud data (PCD) to a BIM description. Automating this process would drastically reduce the modeling cost. Surface extraction from PCD is a fundamental step in this process. Compact modeling of redundant points in PCD as a set of planes leads to smaller file size and fast interactive visualization on cheap hardware. Traditional approaches for smooth surface reconstruction do not explicitly model the sparse scene structure or significantly exploit the redundancy. This paper proposes a method based on sparsity-inducing optimization to address the planar surface extraction problem. Through sparse optimization, points in PCD are segmented according to their embedded linear subspaces. Within each segmented part, plane models can be estimated. Experimental results on a typical noisy PCD demonstrate the effectiveness of the algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose – This paper aims to examine the antecedent influences and merits of workplace occupations as a tactical response to employer redundancy initiatives.

Design/methodology/approach – The data are based on analysis of secondary documentary material reporting on three workplace occupations in the Republic of Ireland during 2009.

Findings – Perceptions of both procedural (e.g. employer unilateral action) and substantive (e.g. pay and entitlements) justice appear pivotal influences. Spillover effects from other known occupations may also be influential. Workplace occupations were found to produce some modest substantive gains, such as enhancing redundancy payments. The tactic of workplace occupation was also found to transform unilateral employer action into scenarios based upon negotiated settlement supported by third-party mediation. However the tactic of workplace occupation in response to redundancy runs the risks of potential judicial injunction and sanction.

Research limitations/implications – Although operationally difficult, future studies should strive to collect primary data workplace occupations as they occur.

Originality/value – The paper identifies conditions conducive to the genesis of workplace occupations and the extent to which the tactic may be of benefit in particular circumstances to workers facing redundancy. It also contextualises the tactic in relation to both collective mobilisation and bargaining theories in employment relations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La compression des données est la technique informatique qui vise à réduire la taille de l’information pour minimiser l’espace de stockage nécessaire et accélérer la transmission des données dans les réseaux à bande passante limitée. Plusieurs techniques de compression telles que LZ77 et ses variantes souffrent d’un problème que nous appelons la redondance causée par la multiplicité d’encodages. La multiplicité d’encodages (ME) signifie que les données sources peuvent être encodées de différentes manières. Dans son cas le plus simple, ME se produit lorsqu’une technique de compression a la possibilité, au cours du processus d’encodage, de coder un symbole de différentes manières. La technique de compression par recyclage de bits a été introduite par D. Dubé et V. Beaudoin pour minimiser la redondance causée par ME. Des variantes de recyclage de bits ont été appliquées à LZ77 et les résultats expérimentaux obtenus conduisent à une meilleure compression (une réduction d’environ 9% de la taille des fichiers qui ont été compressés par Gzip en exploitant ME). Dubé et Beaudoin ont souligné que leur technique pourrait ne pas minimiser parfaitement la redondance causée par ME, car elle est construite sur la base du codage de Huffman qui n’a pas la capacité de traiter des mots de code (codewords) de longueurs fractionnaires, c’est-à-dire qu’elle permet de générer des mots de code de longueurs intégrales. En outre, le recyclage de bits s’appuie sur le codage de Huffman (HuBR) qui impose des contraintes supplémentaires pour éviter certaines situations qui diminuent sa performance. Contrairement aux codes de Huffman, le codage arithmétique (AC) peut manipuler des mots de code de longueurs fractionnaires. De plus, durant ces dernières décennies, les codes arithmétiques ont attiré plusieurs chercheurs vu qu’ils sont plus puissants et plus souples que les codes de Huffman. Par conséquent, ce travail vise à adapter le recyclage des bits pour les codes arithmétiques afin d’améliorer l’efficacité du codage et sa flexibilité. Nous avons abordé ce problème à travers nos quatre contributions (publiées). Ces contributions sont présentées dans cette thèse et peuvent être résumées comme suit. Premièrement, nous proposons une nouvelle technique utilisée pour adapter le recyclage de bits qui s’appuie sur les codes de Huffman (HuBR) au codage arithmétique. Cette technique est nommée recyclage de bits basé sur les codes arithmétiques (ACBR). Elle décrit le cadriciel et les principes de l’adaptation du HuBR à l’ACBR. Nous présentons aussi l’analyse théorique nécessaire pour estimer la redondance qui peut être réduite à l’aide de HuBR et ACBR pour les applications qui souffrent de ME. Cette analyse démontre que ACBR réalise un recyclage parfait dans tous les cas, tandis que HuBR ne réalise de telles performances que dans des cas très spécifiques. Deuxièmement, le problème de la technique ACBR précitée, c’est qu’elle requiert des calculs à précision arbitraire. Cela nécessite des ressources illimitées (ou infinies). Afin de bénéficier de cette dernière, nous proposons une nouvelle version à précision finie. Ladite technique devienne ainsi efficace et applicable sur les ordinateurs avec les registres classiques de taille fixe et peut être facilement interfacée avec les applications qui souffrent de ME. Troisièmement, nous proposons l’utilisation de HuBR et ACBR comme un moyen pour réduire la redondance afin d’obtenir un code binaire variable à fixe. Nous avons prouvé théoriquement et expérimentalement que les deux techniques permettent d’obtenir une amélioration significative (moins de redondance). À cet égard, ACBR surpasse HuBR et fournit une classe plus étendue des sources binaires qui pouvant bénéficier d’un dictionnaire pluriellement analysable. En outre, nous montrons qu’ACBR est plus souple que HuBR dans la pratique. Quatrièmement, nous utilisons HuBR pour réduire la redondance des codes équilibrés générés par l’algorithme de Knuth. Afin de comparer les performances de HuBR et ACBR, les résultats théoriques correspondants de HuBR et d’ACBR sont présentés. Les résultats montrent que les deux techniques réalisent presque la même réduction de redondance sur les codes équilibrés générés par l’algorithme de Knuth.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Les temps de réponse dans une tache de reconnaissance d’objets visuels diminuent de façon significative lorsque les cibles peuvent être distinguées à partir de deux attributs redondants. Le gain de redondance pour deux attributs est un résultat commun dans la littérature, mais un gain causé par trois attributs redondants n’a été observé que lorsque ces trois attributs venaient de trois modalités différentes (tactile, auditive et visuelle). La présente étude démontre que le gain de redondance pour trois attributs de la même modalité est effectivement possible. Elle inclut aussi une investigation plus détaillée des caractéristiques du gain de redondance. Celles-ci incluent, outre la diminution des temps de réponse, une diminution des temps de réponses minimaux particulièrement et une augmentation de la symétrie de la distribution des temps de réponse. Cette étude présente des indices que ni les modèles de course, ni les modèles de coactivation ne sont en mesure d’expliquer l’ensemble des caractéristiques du gain de redondance. Dans ce contexte, nous introduisons une nouvelle méthode pour évaluer le triple gain de redondance basée sur la performance des cibles doublement redondantes. Le modèle de cascade est présenté afin d’expliquer les résultats de cette étude. Ce modèle comporte plusieurs voies de traitement qui sont déclenchées par une cascade d’activations avant de satisfaire un seul critère de décision. Il offre une approche homogène aux recherches antérieures sur le gain de redondance. L’analyse des caractéristiques des distributions de temps de réponse, soit leur moyenne, leur symétrie, leur décalage ou leur étendue, est un outil essentiel pour cette étude. Il était important de trouver un test statistique capable de refléter les différences au niveau de toutes ces caractéristiques. Nous abordons la problématique d’analyser les temps de réponse sans perte d’information, ainsi que l’insuffisance des méthodes d’analyse communes dans ce contexte, comme grouper les temps de réponses de plusieurs participants (e. g. Vincentizing). Les tests de distributions, le plus connu étant le test de Kolmogorov- Smirnoff, constituent une meilleure alternative pour comparer des distributions, celles des temps de réponse en particulier. Un test encore inconnu en psychologie est introduit : le test d’Anderson-Darling à deux échantillons. Les deux tests sont comparés, et puis nous présentons des indices concluants démontrant la puissance du test d’Anderson-Darling : en comparant des distributions qui varient seulement au niveau de (1) leur décalage, (2) leur étendue, (3) leur symétrie, ou (4) leurs extrémités, nous pouvons affirmer que le test d’Anderson-Darling reconnait mieux les différences. De plus, le test d’Anderson-Darling a un taux d’erreur de type I qui correspond exactement à l’alpha tandis que le test de Kolmogorov-Smirnoff est trop conservateur. En conséquence, le test d’Anderson-Darling nécessite moins de données pour atteindre une puissance statistique suffisante.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Big data nowadays is a fashionable topic, independently of what people mean when they use this term. But being big is just a matter of volume, although there is no clear agreement in the size threshold. On the other hand, it is easy to capture large amounts of data using a brute force approach. So the real goal should not be big data but to ask ourselves, for a given problem, what is the right data and how much of it is needed. For some problems this would imply big data, but for the majority of the problems much less data will and is needed. In this talk we explore the trade-offs involved and the main problems that come with big data using the Web as case study: scalability, redundancy, bias, noise, spam, and privacy. Speaker Biography Ricardo Baeza-Yates Ricardo Baeza-Yates is VP of Research for Yahoo Labs leading teams in United States, Europe and Latin America since 2006 and based in Sunnyvale, California, since August 2014. During this time he has lead the labs in Barcelona and Santiago de Chile. Between 2008 and 2012 he also oversaw the Haifa lab. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra, in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and before founder and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today). He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electronics engineer degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: This study describes a bioinformatics approach designed to identify Plasmodium vivax proteins potentially involved in reticulocyte invasion. Specifically, different protein training sets were built and tuned based on different biological parameters, such as experimental evidence of secretion and/or involvement in invasion-related processes. A profile-based sequence method supported by hidden Markov models (HMMs) was then used to build classifiers to search for biologically-related proteins. The transcriptional profile of the P. vivax intra-erythrocyte developmental cycle was then screened using these classifiers. Results: A bioinformatics methodology for identifying potentially secreted P. vivax proteins was designed using sequence redundancy reduction and probabilistic profiles. This methodology led to identifying a set of 45 proteins that are potentially secreted during the P. vivax intra-erythrocyte development cycle and could be involved in cell invasion. Thirteen of the 45 proteins have already been described as vaccine candidates; there is experimental evidence of protein expression for 7 of the 32 remaining ones, while no previous studies of expression, function or immunology have been carried out for the additional 25. Conclusions: The results support the idea that probabilistic techniques like profile HMMs improve similarity searches. Also, different adjustments such as sequence redundancy reduction using Pisces or Cd-Hit allowed data clustering based on rational reproducible measurements. This kind of approach for selecting proteins with specific functions is highly important for supporting large-scale analyses that could aid in the identification of genes encoding potential new target antigens for vaccine development and drug design. The present study has led to targeting 32 proteins for further testing regarding their ability to induce protective immune responses against P. vivax malaria.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Diabetes like many diseases and biological processes is not mono-causal. On the one hand multifactorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Demand for organic milk is partially driven by consumer perceptions that it is more nutritious. However, there is still considerable uncertainty over whether the use of organic production standards affects milk quality. Here we report results of meta-analyses based on 170 published studies comparing the nutrient content of organic and conventional bovine milk. There were no significant differences in total SFA and MUFA concentrations between organic and conventional milk. However, concentrations of total PUFA and n-3 PUFA were significantly higher in organic milk, by an estimated 7 (95 % CI −1, 15) % and 56 (95 % CI 38, 74) %, respectively. Concentrations of α-linolenic acid (ALA), very long-chain n-3 fatty acids (EPA+DPA+DHA) and conjugated linoleic acid were also significantly higher in organic milk, by an 69 (95 % CI 53, 84) %, 57 (95 % CI 27, 87) % and 41 (95 % CI 14, 68) %, respectively. As there were no significant differences in total n-6 PUFA and linoleic acid (LA) concentrations, the n-6:n-3 and LA:ALA ratios were lower in organic milk, by an estimated 71 (95 % CI −122, −20) % and 93 (95 % CI −116, −70) %. It is concluded that organic bovine milk has a more desirable fatty acid composition than conventional milk. Meta-analyses also showed that organic milk has significantly higher α-tocopherol and Fe, but lower I and Se concentrations. Redundancy analysis of data from a large cross-European milk quality survey indicates that the higher grazing/conserved forage intakes in organic systems were the main reason for milk composition differences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the present study, we propose a theoretical graph procedure to investigate multiple pathways in brain functional networks. By taking into account all the possible paths consisting of h links between the nodes pairs of the network, we measured the global network redundancy R (h) as the number of parallel paths and the global network permeability P (h) as the probability to get connected. We used this procedure to investigate the structural and dynamical changes in the cortical networks estimated from a dataset of high-resolution EEG signals in a group of spinal cord injured (SCI) patients during the attempt of foot movement. In the light of a statistical contrast with a healthy population, the permeability index P (h) of the SCI networks increased significantly (P < 0.01) in the Theta frequency band (3-6 Hz) for distances h ranging from 2 to 4. On the contrary, no significant differences were found between the two populations for the redundancy index R (h) . The most significant changes in the brain functional network of SCI patients occurred mainly in the lower spectral contents. These changes were related to an improved propagation of communication between the closest cortical areas rather than to a different level of redundancy. This evidence strengthens the hypothesis of the need for a higher functional interaction among the closest ROIs as a mechanism to compensate the lack of feedback from the peripheral nerves to the sensomotor areas.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently, much attention has been given to the mass spectrometry (MS) technology based disease classification, diagnosis, and protein-based biomarker identification. Similar to microarray based investigation, proteomic data generated by such kind of high-throughput experiments are often with high feature-to-sample ratio. Moreover, biological information and pattern are compounded with data noise, redundancy and outliers. Thus, the development of algorithms and procedures for the analysis and interpretation of such kind of data is of paramount importance. In this paper, we propose a hybrid system for analyzing such high dimensional data. The proposed method uses the k-mean clustering algorithm based feature extraction and selection procedure to bridge the filter selection and wrapper selection methods. The potential informative mass/charge (m/z) markers selected by filters are subject to the k-mean clustering algorithm for correlation and redundancy reduction, and a multi-objective Genetic Algorithm selector is then employed to identify discriminative m/z markers generated by k-mean clustering algorithm. Experimental results obtained by using the proposed method indicate that it is suitable for m/z biomarker selection and MS based sample classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

People with special medical monitoring needs can, these days, be sent home and remotely monitored through the use of data logging medical sensors and a transmission base-station. While this can improve quality of life by allowing the patient to spend most of their time at home, most current technologies rely on hardwired landline technology or expensive mobile data transmissions to transmit data to a medical facility. The aim of this paper is to investigate and develop an approach to increase the freedom of a monitored patient and decrease costs by utilising mobile technologies and SMS messaging to transmit data from patient to medico. To this end, we evaluated the capabilities of SMS and propose a generic communications protocol which can work within the constraints of the SMS format, but provide the necessary redundancy and robustness to be used for the transmission of non-critical medical telemetry from data logging medical sensors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A strategy to measure bacterial functional redundancy was developed and tested with soils collected along a soil reclamation gradient by determining the richness and diversity of bacterial groups capable of in situ growth on selected carbon substrates. Soil cores were collected from four sites along a transect from the Jamari tin mine site in the Jamari National Forest, Rondonia, RO, Brazil: denuded mine spoil, soil from below the canopy of invading pioneer trees, revegetated soil under new growth on the forest edge, and the forest floor of an adjacent preserved forest. Bacterial population responses were analyzed by amending these soil samples with individual carbon substrates in the presence of bromodeoxyuridine (BrdU), BrdU-labeled DNA was then subjected to a 16S-23S rRNA intergenic analysis to depict the actively growing bacteria from each site, the number and diversity of bacterial groups responding to four carbon substrates (L-serine, L-threonine, sodium citrate, and or-lactose hydrate) increased along the reclamation-vegetation gradient such that the preserved forest soil samples contained the highest functional redundancy for each substrate. These data suggest that bacterial functional redundancy increases in relation to the regrowth of plant communities and may therefore represent an important aspect of the restoration of soil biological functionality to reclaimed mine spoils. They also suggest that bacterial functional redundancy may be a useful indicator of soil quality and ecosystem functioning.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cyclin-dependent kinases (CDKs) successively phosphorylate the retinoblastoma protein (RB) at the restriction point in G1 phase. Hyperphosphorylation results in functional inactivation of RB, activation of the E2F transcriptional program, and entry of cells into S phase. RB unphosphorylated at serine 608 has growth suppressive activity. Phosphorylation of serines 608/612 inhibits binding of E2F-1 to RB. In Nalm-6 acute lymphoblastic leukemia extracts, serine 608 is phosphorylated by CDK4/6 complexes but not by CDK2. We reasoned that phosphorylation of serines 608/612 by redundant CDKs could accelerate phospho group formation and determined which G1 CDK contributes to serine 612 phosphorylation. Here, we report that CDK4 complexes from Nalm-6 extracts phosphorylated in vitro the CDK2-preferred serine 612, which was inhibited by p16INK4a, and fascaplysin. In contrast, serine 780 and serine 795 were efficiently phosphorylated by CDK4 but not by CDK2. The data suggest that the redundancy in phosphorylation of RB by CDK2 and CDK4 in Nalm-6 extracts is limited. Serine 612 phosphorylation by CDK4 also occurred in extracts of childhood acute lymphoblastic leukemia cells but not in extracts of mobilized CD34+ hemopoietic progenitor cells. This phenomenon could contribute to the commitment of childhood acute lymphocytic leukemia cells to proliferate and explain their refractoriness to differentiation-inducing agents.