981 resultados para SEGMENTS
Resumo:
Microbes and their exopolysaccharides (EPS) can block xylem vessels, thereby increasing the hydraulic resistance and decreasing the vase life of cut flowers and foliage. Scanning electron microscopy (SEM) provides a powerful tool for investigation of bacteria-induced xylem occlusion. However, conventional preparation protocols for SEM involving chemicals can cause loss of hydrated EPS material, and thereby damage the bacterial biofilms during dehydration. A modified chemical fixation protocol involving pre-fixation with 75 mM lysine plus 2.5% glutaraldehyde followed by the normal fixation in 3% glutaraldehyde was, therefore, tested for improved preservation of bacterial biofilm at the stem-ends of cut Acacia holosericea foliage stems. Stem-end segments with different stages of bacterial growth were obtained from stems stood into water. The lysine-based protocol was compared with four other processing protocols of critical point drying (CPD) without fixation (control), freeze-drying (FD), conventional chemical fixation followed by drying with hexamethyldisilazane (HMDS), and conventional chemical fixation with CPD. The non-fixed control. FD and the glutaraldehyde fixation with HMDS drying gave poor preservation of hydrated material, including bacterial EPS. Conventional glutaraldehyde fixation followed by CPD was superior to these three methods in terms of better preserving the EPS. However, this fourth method gave condensation of biofilms during dehydration. In contrast, the modified lysine-based protocol resulted in superior preservation of EPS and biofilm structure. Thus, this fifth method was the most appropriate for examination of bacterial stem-end blockage in cut ornamentals. (C) 2012 Elsevier B.V. All rights reserved.
High resolution mapping of Dense spike-ar (dsp.ar) to the genetic centromere of barley chromosome 7H
Resumo:
Spike density in barley is under the control of several major genes, as documented previously by genetic analysis of a number of morphological mutants. One such class of mutants affects the rachis internode length leading to dense or compact spikes and the underlying genes were designated dense spike (dsp). We previously delimited two introgressed genomic segments on chromosome 3H (21 SNP loci, 35.5 cM) and 7H (17 SNP loci, 20.34 cM) in BW265, a BC7F3 nearly isogenic line (NIL) of cv. Bowman as potentially containing the dense spike mutant locus dsp.ar, by genotyping 1,536 single nucleotide polymorphism (SNP) markers in both BW265 and its recurrent parent. Here, the gene was allocated by high-resolution bi-parental mapping to a 0.37 cM interval between markers SC57808 (Hv_SPL14)-CAPSK06413 residing on the short and long arm at the genetic centromere of chromosome 7H, respectively. This region putatively contains more than 800 genes as deduced by comparison with the collinear regions of barley, rice, sorghum and Brachypodium, Classical map-based isolation of the gene dsp.ar thus will be complicated due to the infavorable relationship of genetic to physical distances at the target locus.
Resumo:
Excised stem, leaf segments and whole flower of the allergenic weed P. hysterophorus were cultured on Murashighe and Skoog's basal medium supplemented with hormones. Shoot buds readily formed in the stem callus cultured on MS Medium supplemented with IAA and BAP or Kinetin. The leaf callus formed roots alone in a wide variety of media. Suspension cultures were initiated from the leaf and stem callus. The leaf callus elicited a positive patch test response for delayed hypersensitivity in 4 patients suffering from Parthenium dermatitis, thus indicating its ability to synthesise the allergenic principle(s).
Resumo:
Bacterial proliferation in both vase solutions and in cut flower stems has been implicated in reducing the vase life of numerous genera. Boronia heterophylla F. Muell. (Red Boronia) vase life was assessed at two stages of floral maturity for nine vase solution treatments covering a pH range of 2.5-5.7. Vase life for advanced harvest maturity stems ranged from 4.2 d in 10 mM citric acid + 50 mg L-1 chlorine (pH 2.5) to 12.9 d after STS pulsing (pH 5.7). For normal harvest maturity stems, the corresponding range was 5.8-19.0 d, respectively. Vase solutions containing 50 mg L-1 chlorine biocide resulted in decreased longevity. In contrast, pulsing with the ethylene-binding inhibitor, STS, significantly increased vase life. The number of bacteria in the vase solutions after 11 d was determined in stems of advanced maturity. The solution with the greatest number of bacteria, 4.0 x 10(10) cfu mL(-1), was water used after STS pulsing and in which the flowers lasted longest. Vase solution bacteria were enumerated on days 0,3, 6, 9 and 12 of the vase period with stems of normal harvest maturity. There was no relationship between vase life and vase solution bacterial numbers ((R) over bar (2) = 0.000). Moreover, there was a negative relationship between numbers of bacteria in basal 0-5 cm stem segments and vase life. As no correlations were evident between longevity and either the pH or vase solution bacterial numbers, B. heterophylla vase life was evidently limited principally by ethylene action. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
The retinylidene Schiff base derivative of seven lysine containing peptides have been prepared in order to investigate solvent and neighboring group effects, on the absorption maximum of the protonated Schiff base chromophore. The peptides studied are Boc-Aib-Lys-Aib-OMe (1), Boc-Ala-Aib-Lys-OMe (2), Boc-Ala-Aib-Lys-Aib-OMe (3), Boc-Aib-Asp-Aib-Aib-Lys-Aib-OMe (4), Boc-Aib-Asp-Aib-Ala-Aib-Lys-Aib-OMe (5), Boc-Lys-Val-Gly-Phe-OMe (6) and Boc-Ser-Ala-Lys-Val-Gly-Phe-OMe (7). In all cases protonation shifts the absorption maxima to the red by 3150–8450 cm-1. For peptides 1–3 the protonation shifts are significantly larger in nonhydrogen bonding solvents like CHCl3 or CH2Cl2 as compared to hydrogen bonding solvents like CH3OH. The presence of a proximal Asp residue in 4 and 5 results in pronounced blue shift of the absorption maximum of the protonated Schiff base in CHCl3, relative to peptides lacking this residue. Peptides 6 and 7 represent small segments of the bacteriorhodopsin sequence in the vicinity of Lys-216. The presence of Ser reduces the magnitude of the protonation shift.
Resumo:
Polymer protected gold nanoparticles have successfully been synthesized by both "grafting-from" and "grafting-to" techniques. The synthesis methods of the gold particles were systematically studied. Two chemically different homopolymers were used to protect gold particles: thermo-responsive poly(N-isopropylacrylamide), PNIPAM, and polystyrene, PS. Both polymers were synthesized by using a controlled/living radical polymerization process, reversible addition-fragmentation chain transfer (RAFT) polymerization, to obtain monodisperse polymers of various molar masses and carrying dithiobenzoate end groups. Hence, particles protected either with PNIPAM, PNIPAM-AuNPs, or with a mixture of two polymers, PNIPAM/PS-AuNPs (i.e., amphiphilic gold nanoparticles), were prepared. The particles contain monodisperse polymer shells, though the cores are somewhat polydisperse. Aqueous PNIPAM-AuNPs prepared using a "grafting-from" technique, show thermo-responsive properties derived from the tethered PNIPAM chains. For PNIPAM-AuNPs prepared using a "grafting-to" technique, two-phase transitions of PNIPAM were observed in the microcalorimetric studies of the aqueous solutions. The first transition with a sharp and narrow endothermic peak occurs at lower temperature, and the second one with a broader peak at higher temperature. In the first transition PNIPAM segments show much higher cooperativity than in the second one. The observations are tentatively rationalized by assuming that the PNIPAM brush can be subdivided into two zones, an inner and an outer one. In the inner zone, the PNIPAM segments are close to the gold surface, densely packed, less hydrated, and undergo the first transition. In the outer zone, on the other hand, the PNIPAM segments are looser and more hydrated, adopt a restricted random coil conformation, and show a phase transition, which is dependent on both particle concentration and the chemical nature of the end groups of the PNIPAM chains. Monolayers of the amphiphilic gold nanoparticles at the air-water interface show several characteristic regions upon compression in a Langmuir trough at room temperature. These can be attributed to the polymer conformational transitions from a pancake to a brush. Also, the compression isotherms show temperature dependence due to the thermo-responsive properties of the tethered PNIPAM chains. The films were successfully deposited on substrates by Langmuir-Blodgett technique. The sessile drop contact angle measurements conducted on both sides of the monolayer deposited at room temperature reveal two slightly different contact angles, that may indicate phase separation between the tethered PNIPAM and PS chains on the gold core. The optical properties of amphiphilic gold nanoparticles were studied both in situ at the air-water interface and on the deposited films. The in situ SPR band of the monolayer shows a blue shift with compression, while a red shift with the deposition cycle occurs in the deposited films. The blue shift is compression-induced and closely related to the conformational change of the tethered PNIPAM chains, which may cause a decrease in the polarity of the local environment of the gold cores. The red shift in the deposited films is due to a weak interparticle coupling between adjacent particles. Temperature effects on the SPR band in both cases were also investigated. In the in situ case, at a constant surface pressure, an increase in temperature leads to a red shift in the SPR, likely due to the shrinking of the tethered PNIPAM chains, as well as to a slight decrease of the distance between the adjacent particles resulting in an increase in the interparticle coupling. However, in the case of the deposited films, the SPR band red-shifts with the deposition cycles more at a high temperature than at a low temperature. This is because the compressibility of the polymer coated gold nanoparticles at a high temperature leads to a smaller interparticle distance, resulting in an increase of the interparticle coupling in the deposited multilayers.
Resumo:
A wide range of models used in agriculture, ecology, carbon cycling, climate and other related studies require information on the amount of leaf material present in a given environment to correctly represent radiation, heat, momentum, water, and various gas exchanges with the overlying atmosphere or the underlying soil. Leaf area index (LAI) thus often features as a critical land surface variable in parameterisations of global and regional climate models, e.g., radiation uptake, precipitation interception, energy conversion, gas exchange and momentum, as all areas are substantially determined by the vegetation surface. Optical wavelengths of remote sensing are the common electromagnetic regions used for LAI estimations and generally for vegetation studies. The main purpose of this dissertation was to enhance the determination of LAI using close-range remote sensing (hemispherical photography), airborne remote sensing (high resolution colour and colour infrared imagery), and satellite remote sensing (high resolution SPOT 5 HRG imagery) optical observations. The commonly used light extinction models are applied at all levels of optical observations. For the sake of comparative analysis, LAI was further determined using statistical relationships between spectral vegetation index (SVI) and ground based LAI. The study areas of this dissertation focus on two regions, one located in Taita Hills, South-East Kenya characterised by tropical cloud forest and exotic plantations, and the other in Gatineau Park, Southern Quebec, Canada dominated by temperate hardwood forest. The sampling procedure of sky map of gap fraction and size from hemispherical photographs was proven to be one of the most crucial steps in the accurate determination of LAI. LAI and clumping index estimates were significantly affected by the variation of the size of sky segments for given zenith angle ranges. On sloping ground, gap fraction and size distributions present strong upslope/downslope asymmetry of foliage elements, and thus the correction and the sensitivity analysis for both LAI and clumping index computations were demonstrated. Several SVIs can be used for LAI mapping using empirical regression analysis provided that the sensitivities of SVIs at varying ranges of LAI are large enough. Large scale LAI inversion algorithms were demonstrated and were proven to be a considerably efficient alternative approach for LAI mapping. LAI can be estimated nonparametrically from the information contained solely in the remotely sensed dataset given that the upper-end (saturated SVI) value is accurately determined. However, further study is still required to devise a methodology as well as instrumentation to retrieve on-ground green leaf area index . Subsequently, the large scale LAI inversion algorithms presented in this work can be precisely validated. Finally, based on literature review and this dissertation, potential future research prospects and directions were recommended.
Resumo:
The first complete genome sequence of capsicum chlorosis virus (CaCV) from Australia was determined using a combination of Illumina HiSeq RNA and Sanger sequencing technologies. Australian CaCV had a tripartite genome structure like other CaCV isolates. The large (L) RNA was 8913 nucleotides (nt) in length and contained a single open reading frame (ORF) of 8634 nt encoding a predicted RNA-dependent RNA polymerase (RdRp) in the viral-complementary (vc) sense. The medium (M) and small (S) RNA segments were 4846 and 3944 nt in length, respectively, each containing two non-overlapping ORFs in ambisense orientation, separated by intergenic regions (IGR). The M segment contained ORFs encoding the predicted non-structural movement protein (NSm; 927 nt) and precursor of glycoproteins (GP; 3366 nt) in the viral sense (v) and vc strand, respectively, separated by a 449-nt IGR. The S segment coded for the predicted nucleocapsid (N) protein (828 nt) and non-structural suppressor of silencing protein (NSs; 1320 nt) in the vc and v strand, respectively. The S RNA contained an IGR of 1663 nt, being the largest IGR of all CaCV isolates sequenced so far. Comparison of the Australian CaCV genome with complete CaCV genome sequences from other geographic regions showed highest sequence identity with a Taiwanese isolate. Genome sequence comparisons and phylogeny of all available CaCV isolates provided evidence for at least two highly diverged groups of CaCV isolates that may warrant re-classification of AIT-Thailand and CP-China isolates as unique tospoviruses, separate from CaCV.
Resumo:
The colour of commercial cooked black tiger prawns (Penaeus monodon) is a key quality requirement to ensure product is not rejected in wholesale markets. The colour, due to the carotenoid astaxanthin, can be impacted by frozen storage, but changes in colour or astaxanthin profile, during frozen storage, have not been studied in detail. Subsequently in this study, the aims were to define the astaxanthin (as cis, trans, mono-ester and di-ester forms) content, together with the colour properties, in both pleopods (legs) and abdominal segments. Changes in astaxanthin content and colour properties were further determined during frozen storage (−20°C). Total astaxanthin content was seen to decrease in all samples over time, with the rate of degradation being significantly greater (P < 0.05) in pleopods than abdomen. In both pleopods and abdomen, rate of degradation of esterified forms was significantly greater (P < 0.05) than non-esterified forms. Hue angle (increase), a* value (decrease) and L value (increase) were all seen to significantly change (P < 0.05) during storage, with changes being more prevalent in the pleopods. The pleopods are the key indicator of astaxanthin and colour loss in cooked black tiger prawns and preservation strategies are required to preserve astaxanthin and colour during frozen storage.
Resumo:
In this paper we propose a hypothetical scheme for recognizing the alphanumerics. The scheme is based on the known physiological structure of the visual cortex and the concept of a short Lino extractor nouron (SLEN). We assumo four basic typca of such units for extracting vertical, horizontal, right and left inclined straight line segments. The patterns reconstructed from the scheme show perfect agreement with the test patterns. The model indicates that the recognition of letters T and H requires extraction of the largest number of features.
Resumo:
Diseases caused by Tobacco streak virus (TSV) have resulted in significant crop losses in sunflower and mung bean crops in Australia. Two genetically distinct strains from central Queensland, TSV-parthenium and TSV-crownbeard, have been previously described. They share only 81% total-genome nucleotide sequence identity and have distinct major alternative hosts, Parthenium hysterophorus (parthenium) and Verbesina encelioides (crownbeard). We developed and used strain-specific multiplex Polymerase chain reactions (PCRs) for the three RNA segments of TSV-parthenium and TSV-crownbeard to accurately characterise the strains naturally infecting 41 hosts species. Hosts included species from 11 plant families, including 12 species endemic to Australia. Results from field surveys and inoculation tests indicate that parthenium is a poor host of TSV-crownbeard. By contrast, crownbeard was both a natural host of, and experimentally infected by TSV-parthenium but this infection combination resulted in non-viable seed. These differences appear to be an effective biological barrier that largely restricts these two TSV strains to their respective major alternative hosts. TSV-crownbeard was seed transmitted from naturally infected crownbeard at a rate of between 5% and 50% and was closely associated with the geographical distribution of crownbeard in central Queensland. TSV-parthenium and TSV-crownbeard were also seed transmitted in experimentally infected ageratum (Ageratum houstonianum) at rates of up to 40% and 27%, respectively. The related subgroup 1 ilarvirus, Ageratum latent virus, was also seed transmitted at a rate of 18% in ageratum which is its major alternative host. Thrips species Frankliniella schultzei and Microcephalothrips abdominalis were commonly found in flowers of TSV-affected crops and nearby weed hosts. Both species readily transmitted TSV-parthenium and TSV-crownbeard. The results are discussed in terms of how two genetically and biologically distinct TSV strains have similar life cycle strategies in the same environment.
Resumo:
A quarter of Australia’s sunflower production is from the central highlands region of Queensland and is currently worth six million dollars ($AUD) annually. From the early 2000s a severe necrosis disorder of unknown aetiology was affecting large areas of sunflower crops in central Queensland, leading to annual losses of up to 20%. Other crops such as mung bean and cotton were also affected. This PhD study was undertaken to determine if the causal agent of the necrosis disorder was of viral origin and, if so, to characterise its genetic diversity, biology and disease cycle, and to develop effective control strategies. The research described in this thesis identified Tobacco streak virus (TSV; genus Ilarvirus, family Bromoviridae) as the causal agent of the previously unidentified necrosis disorder of sunflower in central Queensland. TSV was also the cause of commonly found diseases in a range of other crops in the same region including cotton, chickpea and mung bean. This was the first report from Australia of natural field infections of TSV from these four crops. TSV strains have previously been reported from other regions of Australia in several hosts based on serological and host range studies. In order to determine the relatedness of previously reported TSV strains with TSV from central Queensland, we characterised the genetic diversity of the known TSV strains from Australia. We identified two genetically distinct TSV strains from central Queensland and named them based on their major alternative hosts, TSV-parthenium from Parthenium hysterophorus and TSV-crownbeard from Verbesina encelioides. They share only 81 % total-genome nucleotide sequence identity. In addition to TSV-parthenium and TSV-crownbeard from central Queensland, we also described the complete genomes of two other ilarvirus species. This proved that previously reported TSV strains, TSV-S isolated from strawberry and TSV-Ag from Ageratum houstonianum, were actually the first record of Strawberry necrotic shock virus from Australia, and a new subgroup 1 ilarvirus, Ageratum latent virus. Our results confirmed that the TSV strains found in central Queensland were not related to previously described strains from Australia and may represent new incursions. This is the first report of the genetic diversity within subgroup 1 ilarviruses from Australia. Based on field observations we hypothesised that parthenium and crownbeard were acting as symptomless hosts of TSV-parthenium and TSV-crownbeard, respectively. We developed strain-specific multiplex PCRs for the three RNA segments to accurately characterise the range of naturally infected hosts across central Queensland. Results described in this thesis show compelling evidence that parthenium and crownbeard are the major (symptomless) alternative hosts of TSV-parthenium and TSV-crownbeard. While both TSV strains had wide natural host ranges, the geographical distribution of each strain was closely associated with the respective distribution of their major alternative hosts. Both TSV strains were commonly found across large areas of central Queensland, but we only found strong evidence for the TSV-parthenium strain being associated with major disease outbreaks in nearby crops. The findings from this study demonstrate that both TSV-parthenium and TSV-crownbeard have similar life cycles but some critical differences. We found both TSV strains to be highly seed transmitted from their respective major alternative hosts from naturally infected mother plants and survived in seed for more than 2 years. We conclusively demonstrated that both TSV strains were readily transmitted via virus-infected pollen taken from the major alternative hosts. This transmission was facilitated by the most commonly collected thrips species, Frankliniella schultzei and Microcephalothrips abdominalis. These results illustrate the importance of seed transmission and efficient thrips vector species for the effective survival of these TSV strains in an often harsh environment and enables the rapid development of TSV disease epidemics in surrounding crops. Results from field surveys and inoculation tests indicate that parthenium is a poor host of TSV-crownbeard. By contrast, crownbeard was naturally infected by, and an experimental host of TSV-parthenium. However, this infection combination resulted in non-viable crownbeard seed. These differences appear to be an effective biological barrier that largely restricts these two TSV strains to their respective major alternative hosts. Based on our field observations we hypothesised that there were differences in relative tolerance to TSV infection between different sunflower hybrids and that seasonal variation in disease levels was related to rainfall in the critical early crop stage. Results from our field trials conducted over multiple years conclusively demonstrated significant differences in tolerance to natural infections of TSV-parthenium in a wide range of sunflower hybrids. Glasshouse tests indicate the resistance to TSV-parthenium identified in the sunflower hybrids is also likely to be effective against TSV-crownbeard. We found a significant negative association between TSV disease incidence in sunflowers and accumulated rainfall in the months of March and April with increasing rainfall resulting in reduced levels of disease. Our results indicate that the use of tolerant sunflower germplasm will be a critical strategy to minimise the risk of TSV epidemics in sunflower.
Resumo:
A quarter of Australia’s sunflower production is from the central highlands region of Queensland and is currently worth six million dollars ($AUD) annually. From the early 2000s a severe necrosis disorder of unknown aetiology was affecting large areas of sunflower crops in central Queensland, leading to annual losses of up to 20%. Other crops such as mung bean and cotton were also affected. This PhD study was undertaken to determine if the causal agent of the necrosis disorder was of viral origin and, if so, to characterise its genetic diversity, biology and disease cycle, and to develop effective control strategies. The research described in this thesis identified Tobacco streak virus (TSV; genus Ilarvirus, family Bromoviridae) as the causal agent of the previously unidentified necrosis disorder of sunflower in central Queensland. TSV was also the cause of commonly found diseases in a range of other crops in the same region including cotton, chickpea and mung bean. This was the first report from Australia of natural field infections of TSV from these four crops. TSV strains have previously been reported from other regions of Australia in several hosts based on serological and host range studies. In order to determine the relatedness of previously reported TSV strains with TSV from central Queensland, we characterised the genetic diversity of the known TSV strains from Australia. We identified two genetically distinct TSV strains from central Queensland and named them based on their major alternative hosts, TSV-parthenium from Parthenium hysterophorus and TSV-crownbeard from Verbesina encelioides. They share only 81 % total-genome nucleotide sequence identity. In addition to TSV-parthenium and TSV-crownbeard from central Queensland, we also described the complete genomes of two other ilarvirus species. This proved that previously reported TSV strains, TSV-S isolated from strawberry and TSV-Ag from Ageratum houstonianum, were actually the first record of Strawberry necrotic shock virus from Australia, and a new subgroup 1 ilarvirus, Ageratum latent virus. Our results confirmed that the TSV strains found in central Queensland were not related to previously described strains from Australia and may represent new incursions. This is the first report of the genetic diversity within subgroup 1 ilarviruses from Australia. Based on field observations we hypothesised that parthenium and crownbeard were acting as symptomless hosts of TSV-parthenium and TSV-crownbeard, respectively. We developed strain-specific multiplex PCRs for the three RNA segments to accurately characterise the range of naturally infected hosts across central Queensland. Results described in this thesis show compelling evidence that parthenium and crownbeard are the major (symptomless) alternative hosts of TSV-parthenium and TSV-crownbeard. While both TSV strains had wide natural host ranges, the geographical distribution of each strain was closely associated with the respective distribution of their major alternative hosts. Both TSV strains were commonly found across large areas of central Queensland, but we only found strong evidence for the TSV-parthenium strain being associated with major disease outbreaks in nearby crops. The findings from this study demonstrate that both TSV-parthenium and TSV-crownbeard have similar life cycles but some critical differences. We found both TSV strains to be highly seed transmitted from their respective major alternative hosts from naturally infected mother plants and survived in seed for more than 2 years. We conclusively demonstrated that both TSV strains were readily transmitted via virus-infected pollen taken from the major alternative hosts. This transmission was facilitated by the most commonly collected thrips species, Frankliniella schultzei and Microcephalothrips abdominalis. These results illustrate the importance of seed transmission and efficient thrips vector species for the effective survival of these TSV strains in an often harsh environment and enables the rapid development of TSV disease epidemics in surrounding crops. Results from field surveys and inoculation tests indicate that parthenium is a poor host of TSV-crownbeard. By contrast, crownbeard was naturally infected by, and an experimental host of TSV-parthenium. However, this infection combination resulted in non-viable crownbeard seed. These differences appear to be an effective biological barrier that largely restricts these two TSV strains to their respective major alternative hosts. Based on our field observations we hypothesised that there were differences in relative tolerance to TSV infection between different sunflower hybrids and that seasonal variation in disease levels was related to rainfall in the critical early crop stage. Results from our field trials conducted over multiple years conclusively demonstrated significant differences in tolerance to natural infections of TSV-parthenium in a wide range of sunflower hybrids. Glasshouse tests indicate the resistance to TSV-parthenium identified in the sunflower hybrids is also likely to be effective against TSV-crownbeard. We found a significant negative association between TSV disease incidence in sunflowers and accumulated rainfall in the months of March and April with increasing rainfall resulting in reduced levels of disease. Our results indicate that the use of tolerant sunflower germplasm will be a critical strategy to minimise the risk of TSV epidemics in sunflower.
Resumo:
The analysis of sequential data is required in many diverse areas such as telecommunications, stock market analysis, and bioinformatics. A basic problem related to the analysis of sequential data is the sequence segmentation problem. A sequence segmentation is a partition of the sequence into a number of non-overlapping segments that cover all data points, such that each segment is as homogeneous as possible. This problem can be solved optimally using a standard dynamic programming algorithm. In the first part of the thesis, we present a new approximation algorithm for the sequence segmentation problem. This algorithm has smaller running time than the optimal dynamic programming algorithm, while it has bounded approximation ratio. The basic idea is to divide the input sequence into subsequences, solve the problem optimally in each subsequence, and then appropriately combine the solutions to the subproblems into one final solution. In the second part of the thesis, we study alternative segmentation models that are devised to better fit the data. More specifically, we focus on clustered segmentations and segmentations with rearrangements. While in the standard segmentation of a multidimensional sequence all dimensions share the same segment boundaries, in a clustered segmentation the multidimensional sequence is segmented in such a way that dimensions are allowed to form clusters. Each cluster of dimensions is then segmented separately. We formally define the problem of clustered segmentations and we experimentally show that segmenting sequences using this segmentation model, leads to solutions with smaller error for the same model cost. Segmentation with rearrangements is a novel variation to the segmentation problem: in addition to partitioning the sequence we also seek to apply a limited amount of reordering, so that the overall representation error is minimized. We formulate the problem of segmentation with rearrangements and we show that it is an NP-hard problem to solve or even to approximate. We devise effective algorithms for the proposed problem, combining ideas from dynamic programming and outlier detection algorithms in sequences. In the final part of the thesis, we discuss the problem of aggregating results of segmentation algorithms on the same set of data points. In this case, we are interested in producing a partitioning of the data that agrees as much as possible with the input partitions. We show that this problem can be solved optimally in polynomial time using dynamic programming. Furthermore, we show that not all data points are candidates for segment boundaries in the optimal solution.
Resumo:
Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.