993 resultados para Data encoding
Resumo:
The Testisin gene (PRSS21) encodes a glycosylphosphatidylinositol (GPI)-linked serine protease that exhibits testis tissue-specific expression. Loss of Testisin has been implicated in testicular tumorigenesis, but its role in testis biology and tumorigenesis is not known. Here we have investigated the role of CpG methylation in Testisin gene inactivation and tested the hypothesis that Testisin may act as a tumour suppressor for testicular tumorigenesis. Using sequence analysis of bisulphite-treated genomic DNA, we find a strong relationship between hypermethylation of a 385 bp 50 CpG rich island of the Testisin gene, and silencing of the Testisin gene in a range of human tumour cell lines and in 100% (eight/eight) of testicular germ cell tumours. We show that treatment of Testisin-negative cell lines with demethylating agents and/or a histone deacetylase inhibitor results in reactivation of Testisin gene expression, implicating hypermethylation in Testisin gene silencing. Stable expression of Testisin in the Testisin-negative Tera-2 testicular cancer line suppressed tumorigenicity as revealed by inhibition of both anchorage-dependent cell growth and tumour formation in an SCID mouse model of testicular tumorigenesis. Together, these data show that loss of Testisin is caused, at least in part, by DNA hypermethylation and histone deacetylation, and suggest a tumour suppressor role for Testisin in testicular tumorigenesis.
Resumo:
Digital image processing is exploited in many diverse applications but the size of digital images places excessive demands on current storage and transmission technology. Image data compression is required to permit further use of digital image processing. Conventional image compression techniques based on statistical analysis have reached a saturation level so it is necessary to explore more radical methods. This thesis is concerned with novel methods, based on the use of fractals, for achieving significant compression of image data within reasonable processing time without introducing excessive distortion. Images are modelled as fractal data and this model is exploited directly by compression schemes. The validity of this is demonstrated by showing that the fractal complexity measure of fractal dimension is an excellent predictor of image compressibility. A method of fractal waveform coding is developed which has low computational demands and performs better than conventional waveform coding methods such as PCM and DPCM. Fractal techniques based on the use of space-filling curves are developed as a mechanism for hierarchical application of conventional techniques. Two particular applications are highlighted: the re-ordering of data during image scanning and the mapping of multi-dimensional data to one dimension. It is shown that there are many possible space-filling curves which may be used to scan images and that selection of an optimum curve leads to significantly improved data compression. The multi-dimensional mapping property of space-filling curves is used to speed up substantially the lookup process in vector quantisation. Iterated function systems are compared with vector quantisers and the computational complexity or iterated function system encoding is also reduced by using the efficient matching algcnithms identified for vector quantisers.
Resumo:
There is a growing demand for data transmission over digital networks involving mobile terminals. An important class of data required for transmission over mobile terminals is image information such as street maps, floor plans and identikit images. This sort of transmission is of particular interest to the service industries such as the Police force, Fire brigade, medical services and other services. These services cannot be applied directly to mobile terminals because of the limited capacity of the mobile channels and the transmission errors caused by the multipath (Rayleigh) fading. In this research, transmission of line diagram images such as floor plans and street maps, over digital networks involving mobile terminals at transmission rates of 2400 bits/s and 4800 bits/s have been studied. A low bit-rate source encoding technique using geometric codes is found to be suitable to represent line diagram images. In geometric encoding, the amount of data required to represent or store the line diagram images is proportional to the image detail. Thus a simple line diagram image would require a small amount of data. To study the effect of transmission errors due to mobile channels on the transmitted images, error sources (error files), which represent mobile channels under different conditions, have been produced using channel modelling techniques. Satisfactory models of the mobile channel have been obtained when compared to the field test measurements. Subjective performance tests have been carried out to evaluate the quality and usefulness of the received line diagram images under various mobile channel conditions. The effect of mobile transmission errors on the quality of the received images has been determined. To improve the quality of the received images under various mobile channel conditions, forward error correcting codes (FEC) with interleaving and automatic repeat request (ARQ) schemes have been proposed. The performance of the error control codes have been evaluated under various mobile channel conditions. It has been shown that a FEC code with interleaving can be used effectively to improve the quality of the received images under normal and severe mobile channel conditions. Under normal channel conditions, similar results have been obtained when using ARQ schemes. However, under severe mobile channel conditions, the FEC code with interleaving shows better performance.
Resumo:
The need for low bit-rate speech coding is the result of growing demand on the available radio bandwidth for mobile communications both for military purposes and for the public sector. To meet this growing demand it is required that the available bandwidth be utilized in the most economic way to accommodate more services. Two low bit-rate speech coders have been built and tested in this project. The two coders combine predictive coding with delta modulation, a property which enables them to achieve simultaneously the low bit-rate and good speech quality requirements. To enhance their efficiency, the predictor coefficients and the quantizer step size are updated periodically in each coder. This enables the coders to keep up with changes in the characteristics of the speech signal with time and with changes in the dynamic range of the speech waveform. However, the two coders differ in the method of updating their predictor coefficients. One updates the coefficients once every one hundred sampling periods and extracts the coefficients from input speech samples. This is known in this project as the Forward Adaptive Coder. Since the coefficients are extracted from input speech samples, these must be transmitted to the receiver to reconstruct the transmitted speech sample, thus adding to the transmission bit rate. The other updates its coefficients every sampling period, based on information of output data. This coder is known as the Backward Adaptive Coder. Results of subjective tests showed both coders to be reasonably robust to quantization noise. Both were graded quite good, with the Forward Adaptive performing slightly better, but with a slightly higher transmission bit rate for the same speech quality, than its Backward counterpart. The coders yielded acceptable speech quality of 9.6kbps for the Forward Adaptive and 8kbps for the Backward Adaptive.
Resumo:
The aim of this work was to construct short analogues of the repetitive water-binding domain of the Pseudomonas syringae ice nucleation protein, InaZ. Structural analysis of these analogues might provide data pertaining to the protein-water contacts that underlie ice nucleation. An artificial gene coding for a 48-mer repeat sequence from InaZ was synthesized from four oligodeoxyribonucleotides and ligated into the expression vector, pGEX2T. The recombinant vector was cloned in Escherichia coli and a glutathione S-transferase fusion protein obtained. This fusion protein displayed a low level of ice-nucleating activity when tested by a droplet freezing assay. The fusion protein could be cleaved with thrombin, providing a means for future recovery of the 48-mer peptide in amounts suitable for structural analysis by nuclear magnetic resonance spectroscopy.
Resumo:
Through direct modeling, a reduction of pattern-dependent errors in a standard fiber-based transmission link at 40 Gbits/s rate is demonstrated by application of a skewed data pre-encoding. The trade-off between the improvement of the bit error rate and the loss in the data rate is examined.
Resumo:
Through modelling of direct error computation, a reduction of pattern- dependent errors in a standard fiber-based transmission link at 40 Gb/s rate is demonstrated by application of a skewed data pre-encoding. The trade-off between the bit-error rate improvement and the data rate loss is examined.
Resumo:
Natural language understanding (NLU) aims to map sentences to their semantic mean representations. Statistical approaches to NLU normally require fully-annotated training data where each sentence is paired with its word-level semantic annotations. In this paper, we propose a novel learning framework which trains the Hidden Markov Support Vector Machines (HM-SVMs) without the use of expensive fully-annotated data. In particular, our learning approach takes as input a training set of sentences labeled with abstract semantic annotations encoding underlying embedded structural relations and automatically induces derivation rules that map sentences to their semantic meaning representations. The proposed approach has been tested on the DARPA Communicator Data and achieved 93.18% in F-measure, which outperforms the previously proposed approaches of training the hidden vector state model or conditional random fields from unaligned data, with a relative error reduction rate of 43.3% and 10.6% being achieved.
Resumo:
Through direct modeling, a reduction of pattern-dependent errors in a standard fiber-based transmission link at 40 Gbits/s rate is demonstrated by application of a skewed data pre-encoding. The trade-off between the improvement of the bit error rate and the loss in the data rate is examined. © 2007 Optical Society of America.
Resumo:
Through modelling of direct error computation, a reduction of pattern- dependent errors in a standard fiber-based transmission link at 40 Gb/s rate is demonstrated by application of a skewed data pre-encoding. The trade-off between the bit-error rate improvement and the data rate loss is examined.
Resumo:
Concurrent coding is an encoding scheme with 'holographic' type properties that are shown here to be robust against a significant amount of noise and signal loss. This single encoding scheme is able to correct for random errors and burst errors simultaneously, but does not rely on cyclic codes. A simple and practical scheme has been tested that displays perfect decoding when the signal to noise ratio is of order -18dB. The same scheme also displays perfect reconstruction when a contiguous block of 40% of the transmission is missing. In addition this scheme is 50% more efficient in terms of transmitted power requirements than equivalent cyclic codes. A simple model is presented that describes the process of decoding and can determine the computational load that would be expected, as well as describing the critical levels of noise and missing data at which false messages begin to be generated.
Resumo:
Coats plus is a highly pleiotropic disorder particularly affecting the eye, brain, bone and gastrointestinal tract. Here, we show that Coats plus results from mutations in CTC1, encoding conserved telomere maintenance component 1, a member of the mammalian homolog of the yeast heterotrimeric CST telomeric capping complex. Consistent with the observation of shortened telomeres in an Arabidopsis CTC1 mutant and the phenotypic overlap of Coats plus with the telomeric maintenance disorders comprising dyskeratosis congenita, we observed shortened telomeres in three individuals with Coats plus and an increase in spontaneous γH2AX-positive cells in cell lines derived from two affected individuals. CTC1 is also a subunit of the α-accessory factor (AAF) complex, stimulating the activity of DNA polymerase-α primase, the only enzyme known to initiate DNA replication in eukaryotic cells. Thus, CTC1 may have a function in DNA metabolism that is necessary for but not specific to telomeric integrity.
Resumo:
SQL Injection Attack (SQLIA) remains a technique used by a computer network intruder to pilfer an organisation’s confidential data. This is done by an intruder re-crafting web form’s input and query strings used in web requests with malicious intent to compromise the security of an organisation’s confidential data stored at the back-end database. The database is the most valuable data source, and thus, intruders are unrelenting in constantly evolving new techniques to bypass the signature’s solutions currently provided in Web Application Firewalls (WAF) to mitigate SQLIA. There is therefore a need for an automated scalable methodology in the pre-processing of SQLIA features fit for a supervised learning model. However, obtaining a ready-made scalable dataset that is feature engineered with numerical attributes dataset items to train Artificial Neural Network (ANN) and Machine Leaning (ML) models is a known issue in applying artificial intelligence to effectively address ever evolving novel SQLIA signatures. This proposed approach applies numerical attributes encoding ontology to encode features (both legitimate web requests and SQLIA) to numerical data items as to extract scalable dataset for input to a supervised learning model in moving towards a ML SQLIA detection and prevention model. In numerical attributes encoding of features, the proposed model explores a hybrid of static and dynamic pattern matching by implementing a Non-Deterministic Finite Automaton (NFA). This combined with proxy and SQL parser Application Programming Interface (API) to intercept and parse web requests in transition to the back-end database. In developing a solution to address SQLIA, this model allows processed web requests at the proxy deemed to contain injected query string to be excluded from reaching the target back-end database. This paper is intended for evaluating the performance metrics of a dataset obtained by numerical encoding of features ontology in Microsoft Azure Machine Learning (MAML) studio using Two-Class Support Vector Machines (TCSVM) binary classifier. This methodology then forms the subject of the empirical evaluation.
Resumo:
In this thesis, we investigate the role of applied physics in epidemiological surveillance through the application of mathematical models, network science and machine learning. The spread of a communicable disease depends on many biological, social, and health factors. The large masses of data available make it possible, on the one hand, to monitor the evolution and spread of pathogenic organisms; on the other hand, to study the behavior of people, their opinions and habits. Presented here are three lines of research in which an attempt was made to solve real epidemiological problems through data analysis and the use of statistical and mathematical models. In Chapter 1, we applied language-inspired Deep Learning models to transform influenza protein sequences into vectors encoding their information content. We then attempted to reconstruct the antigenic properties of different viral strains using regression models and to identify the mutations responsible for vaccine escape. In Chapter 2, we constructed a compartmental model to describe the spread of a bacterium within a hospital ward. The model was informed and validated on time series of clinical measurements, and a sensitivity analysis was used to assess the impact of different control measures. Finally (Chapter 3) we reconstructed the network of retweets among COVID-19 themed Twitter users in the early months of the SARS-CoV-2 pandemic. By means of community detection algorithms and centrality measures, we characterized users’ attention shifts in the network, showing that scientific communities, initially the most retweeted, lost influence over time to national political communities. In the Conclusion, we highlighted the importance of the work done in light of the main contemporary challenges for epidemiological surveillance. In particular, we present reflections on the importance of nowcasting and forecasting, the relationship between data and scientific research, and the need to unite the different scales of epidemiological surveillance.
Resumo:
Most cognitive functions require the encoding and routing of information across distributed networks of brain regions. Information propagation is typically attributed to physical connections existing between brain regions, and contributes to the formation of spatially correlated activity patterns, known as functional connectivity. While structural connectivity provides the anatomical foundation for neural interactions, the exact manner in which it shapes functional connectivity is complex and not yet fully understood. Additionally, traditional measures of directed functional connectivity only capture the overall correlation between neural activity, and provide no insight on the content of transmitted information, limiting their ability in understanding neural computations underlying the distributed processing of behaviorally-relevant variables. In this work, we first study the relationship between structural and functional connectivity in simulated recurrent spiking neural networks with spike timing dependent plasticity. We use established measures of time-lagged correlation and overall information propagation to infer the temporal evolution of synaptic weights, showing that measures of dynamic functional connectivity can be used to reliably reconstruct the evolution of structural properties of the network. Then, we extend current methods of directed causal communication between brain areas, by deriving an information-theoretic measure of Feature-specific Information Transfer (FIT) quantifying the amount, content and direction of information flow. We test FIT on simulated data, showing its key properties and advantages over traditional measures of overall propagated information. We show applications of FIT to several neural datasets obtained with different recording methods (magneto and electro-encephalography, spiking activity, local field potentials) during various cognitive functions, ranging from sensory perception to decision making and motor learning. Overall, these analyses demonstrate the ability of FIT to advance the investigation of communication between brain regions, uncovering the previously unaddressed content of directed information flow.