909 resultados para Splice variant


Relevância:

10.00% 10.00%

Publicador:

Resumo:

his study elucidates some structural and biological features of galactose-binding variants of the cytotoxic proteins ricin and abrin. An isolation procedure is reported for ricin variants from Ricinus communis seeds by using lactamyl-Sepharose affinity matrix, similar to that reported previously for variants of abrin from Abrus precatorius seeds [Hegde, R., Maiti, T. K. & Podder, S. K. (1991) Anal. Biochem. 194, 101–109]. Ricin variants, subfractionated on carboxymethyl-Sepharose CL-6B ion-exchange chromatography, were characterized further by SDS/PAGE, IEF and a binding assay. Based on the immunological cross-reactivity of antibody raised against a single variant of each of ricin and abrin, it was established that all the variants of the corresponding type are immunologically indistinguishable. Analysis of protein titration curves on an immobilized pH gradient indicated that variants of abrin I differ from other abrin variants, mainly in their acidic groups and that variance in ricin is a cause of charge substitution. Detection of subunit variants of proteins by two-dimensional gel electrophoresis showed that there are twice as many subunit variants as there are variants of holoproteins, suggesting that each variant has a set of subunit variants, which, although homologous, are not identical to the subunits of any other variant with respect to pI. Seeds obtained from polymorphic species of R. communis showed no difference in the profile of toxin variants, as analyzed by isoelectric focussing. Toxin variants obtained from red and white varieties of A. precatorius, however, showed some difference in the number of variants as well as in their relative intensities. Furthermore, variants analyzed from several single seeds of A. precatorius red type revealed a controlled distribution of lectin variants in three specific groups, indicating an involvement of at least three genes in the production of Abrus lectins. The complete absence or presence of variants in each group suggested a post-translational differential proteolytic processing, a secondary event in the production of abrin variants.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Human parvovirus B19 is a minute ssDNA virus causing a wide variety of diseases, including erythema infectiosum, arthropathy, anemias, and fetal death. After primary infection, genomic DNA of B19 has been shown to persist in solid tissues of not only symptomatic but also of constitutionally healthy, immunocompetent individuals. In this thesis, the viral DNA was shown to persist as an apparently intact molecule of full length, and without persistence-specific mutations. Thus, although the mere presence of B19 DNA in tissue can not be used as a diagnostic criterion, a possible role in the pathogenesis of diseases e.g. through mRNA or protein production can not be excluded. The molecular mechanism, the host-cell type and the possible clinical significance of B19 DNA tissue persistence are yet to be elucidated. In the beginning of this work, the B19 genomic sequence was considered highly conserved. However, new variants were found: V9 was detected in 1998 in France, in serum of a child with aplastic crisis. This variant differed from the prototypic B19 sequences by ~10 %. In 2002 we found, persisting in skin of constitutionally healthy humans, DNA of another novel B19 variant, LaLi. Genetically this variant differed from both the prototypic sequences and the variant V9 also by ~10%. Simultaneously, B19 isolates with DNA sequences similar to LaLi were introduced by two other groups, in the USA and France. Based on phylogeny, a classification scheme based on three genotypes (B19 types 1-3) was proposed. Although the B19 virus is mainly transmitted via the respiratory route, blood and plasma-derived products contaminated with high levels of B19 DNA have also been shown to be infectious. The European Pharmacopoeia stipulates that, in Europe, from the beginning of 2004, plasma pools for manufacture must contain less than 104 IU/ml of B19 DNA. Quantitative PCR screening is therefore a prerequisite for restriction of the B19 DNA load and obtaining of safe plasma products. Due to the DNA sequence variation among the three B19 genotypes, however, B19 PCR methods might fail to detect the new variants. We therefore examined the suitability of the two commercially available quantitative B19 PCR tests, LightCycler-Parvovirus B19 quantification kit (Roche Diagnostics) and RealArt Parvo B19 LC PCR (Artus), for detection, quantification and differentiation of the three B19 types known, including B19 types 2 and 3. The former method was highly sensitive for detection of the B19 prototype but was not suitable for detection of types 2 and 3. The latter method detected and differentiated all three B19 virus types. However, one of the two type-3 strains was detected at a lower sensitivity. Then, we assessed the prevalence of the three B19 virus types among Finnish blood donors, by screening pooled plasma samples derived from >140 000 blood-donor units: none of the pools contained detectable levels of B19 virus types 2 or 3. According to the results of other groups, B19 type 2 was absent also among Danish blood-donors, and extremely rare among symptomatic European patients. B19 type 3 has been encountered endemically in Ghana and (apparently) in Brazil, and sporadical cases have been detected in France and the UK. We next examined the biological characteristics of these virus types. The p6 promoter regions of virus types 1-3 were cloned in front of a reporter gene, the constructs were transfected into different cell lines, and the promoter activities were measured. As a result, we found that the activities of the three p6 promoters, although differing in sequence by >20%, were of equal strength, and most active in B19-permissive cells. Furthermore, the infectivity of the three B19 types was examined in two B19-permissive cell lines. RT-PCR revealed synthesis of spliced B19 mRNAs, and immunofluorescence verified the production of NS1 and VP proteins in the infected cells. These experiments suggested similar host-cell tropism and showed that the three virus types are strains of the same species, i.e. human parvovirus B19. Last but not least, the sera from subjects infected in the past either with B19 type 1 or type 2 (as evidenced by tissue persistence of the respective DNAs), revealed in VP1/2- and VP2-EIAs a 100 % cross-reactivity between virus types 1 and 2. These results, together with similar studies by others, indicate that the three B19 genotypes constitute a single serotype.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Introduction Single nucleotide polymorphisms in ERAP2 are strongly associated with ankylosing spondylitis (AS). One AS-associated single nucleotide polymorphism, rs2248374, causes a truncated ERAP2 protein that is degraded by nonsense-mediated decay. Approximately 25% of the populations of European ancestry are therefore natural ERAP2 knockouts. We investigated the effect of this associated variant on HLA class I allele presentation, surface heavy chains, endoplasmic reticulum (ER) stress markers and cytokine gene transcription in AS. Methods Patients with AS and healthy controls with either AA or GG homozygous status for rs2248374 were studied. Antibodies to CD14, CD19-ECD, HLA-A-B-C, Valpha7.2, CD161, anti-HC10 and anti-HLA-B27 were used to analyse peripheral blood mononuclear cells. Expression levels of ER stress markers (GRP78 and CHOP) and proinflammatory genes (tumour necrosis factor (TNF), IL6, IL17 and IL22) were assessed by qPCR. Results There was no significant difference in HLAclass I allele presentation or major histocompatibility class I heavy chains or ER stress markers GRP78 and CHOP or proinflammatory gene expression between genotypes for rs2248374 either between cases, between cases and controls, and between controls. Discussion Large differences were not seen in HLAB27 expression or cytokine levels between subjects with and without ERAP2 in AS cases and controls. This suggests that ERAP2 is more likely to influence AS risk through other mechanisms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A concise, flexible approach of general utility to the furo[3,2-b]furanones frorn readily available Morita Baylis-Hillman adducts is delineated In an expeditious variant of this approach, a four-step cascade process is executed in a one-pot operation to generate the furofuranoid framework containing two quaternary centers .

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The integration of stochastic wind power has accentuated a challenge for power system stability assessment. Since the power system is a time-variant system under wind generation fluctuations, pure time-domain simulations are difficult to provide real-time stability assessment. As a result, the worst-case scenario is simulated to give a very conservative assessment of system transient stability. In this study, a probabilistic contingency analysis through a stability measure method is proposed to provide a less conservative contingency analysis which covers 5-min wind fluctuations and a successive fault. This probabilistic approach would estimate the transfer limit of a critical line for a given fault with stochastic wind generation and active control devices in a multi-machine system. This approach achieves a lower computation cost and improved accuracy using a new stability measure and polynomial interpolation, and is feasible for online contingency analysis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Bacilysin is a non-ribosomally synthesized dipeptide antibiotic that is active against a wide range of bacteria and some fungi. Synthesis of bacilysin (L-alanine-[2,3-epoxycyclohexano-4]-L-alanine) is achieved by proteins in the bac operon, also referred to as the bacABCDE (ywfBCDEF) gene cluster in B. subtilis. Extensive genetic analysis from several strains of B. subtilis suggests that the bacABC gene cluster encodes all the proteins that synthesize the epoxyhexanone ring of L-anticapsin. These data, however, were not consistent with the putative functional annotation for these proteins whereby BacA, a prephenate dehydratase along with a potential isomerase/guanylyl transferase, BacB and an oxidoreductase, BacC, could synthesize L-anticapsin. Here we demonstrate that BacA is a decarboxylase that acts on prephenate. Further, based on the biochemical characterization and the crystal structure of BacB, we show that BacB is an oxidase that catalyzes the synthesis of 2-oxo-3-(4-oxocyclohexa-2,5-dienyl)propanoic acid, a precursor to L-anticapsin. This protein is a bi-cupin, with two putative active sites each containing a bound metal ion. Additional electron density at the active site of the C-terminal domain of BacB could be interpreted as a bound phenylpyruvic acid. A significant decrease in the catalytic activity of a point variant of BacB with a mutation at the N-terminal domain suggests that the N-terminal cupin domain is involved in catalysis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Segmentation defects of the vertebrae (SDV) are caused by aberrant somite formation during embryogenesis and result in irregular formation of the vertebrae and ribs. The Notch signal transduction pathway plays a critical role in somite formation and patterning in model vertebrates. In humans, mutations in several genes involved in the Notch pathway are associated with SDV, with both autosomal recessive (MESP2, DLL3, LFNG, HES7) and autosomal dominant (TBX6) inheritance. However, many individuals with SDV do not carry mutations in these genes. Using whole-exome capture and massive parallel sequencing, we identified compound heterozygous mutations in RIPPLY2 in two brothers with multiple regional SDV, with appropriate familial segregation. One novel mutation (c.A238T:p.Arg80*) introduces a premature stop codon. In transiently transfected C2C12 mouse myoblasts, the RIPPLY2 mutant protein demonstrated impaired transcriptional repression activity compared with wild-type RIPPLY2 despite similar levels of expression. The other mutation (c.240-4T>G), with minor allele frequency <0.002, lies in the highly conserved splice site consensus sequence 5' to the terminal exon. Ripply2 has a well-established role in somitogenesis and vertebral column formation, interacting at both gene and protein levels with SDV-associated Mesp2 and Tbx6. We conclude that compound heterozygous mutations in RIPPLY2 are associated with SDV, a new gene for this condition. © The Author 2014.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The extent to which low-frequency (minor allele frequency (MAF) between 1-5%) and rare (MAF variants contribute to complex traits and disease in the general population is mainly unknown. Bone mineral density (BMD) is highly heritable, a major predictor of osteoporotic fractures, and has been previously associated with common genetic variants, as well as rare, population-specific, coding variants. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n = 2,882 from UK10K (ref. 10); a population-based genome sequencing consortium), whole-exome sequencing (n = 3,549), deep imputation of genotyped samples using a combined UK10K/1000 Genomes reference panel (n = 26,534), and de novo replication genotyping (n = 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size fourfold larger than the mean of previously reported common variants for lumbar spine BMD (rs11692564(T), MAF = 1.6%, replication effect size = +0.20 s.d., Pmeta = 2 x 10(-14)), which was also associated with a decreased risk of fracture (odds ratio = 0.85; P = 2 x 10(-11); ncases = 98,742 and ncontrols = 409,511). Using an En1(cre/flox) mouse model, we observed that conditional loss of En1 results in low bone mass, probably as a consequence of high bone turnover. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817(T), MAF = 1.2%, replication effect size = +0.41 s.d., Pmeta = 1 x 10(-11)). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture, thereby providing rationale for whole-genome sequencing and improved imputation reference panels to study the genetic architecture of complex traits and disease in the general population.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Robotics is taught in many Australian ICT classrooms, in both primary and secondary schools. Robotics activities, including those developed using the LEGO Mindstorms NXT technology, are mathematics-rich and provide a fertile round for learners to develop and extend their mathematical thinking. However, this context for learning mathematics is often under-exploited. In this paper a variant of the model construction sequence (Lesh, Cramer, Doerr, Post, & Zawojewski, 2003) is proposed, with the purpose of explicitly integrating robotics and mathematics teaching and learning. Lesh et al.’s model construction sequence and the model eliciting activities it embeds were initially researched in primary mathematics classrooms and more recently in university engineering courses. The model construction sequence involves learners working collaboratively upon product-focussed tasks, through which they develop and expose their conceptual understanding. The integrating model proposed in this paper has been used to design and analyse a sequence of activities in an Australian Year 4 classroom. In that sequence more traditional classroom learning was complemented by the programming of LEGO-based robots to ‘act out’ the addition and subtraction of simple fractions (tenths) on a number-line. The framework was found to be useful for planning the sequence of learning and, more importantly, provided the participating teacher with the ability to critically reflect upon robotics technology as a tool to scaffold the learning of mathematics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The downlink scheduling problem in multi-queue multi-server systems under channel uncertainty is considered. Two policies that make allocations based on predicted channel states are proposed. The first is an extension of the well-known dynamic backpressure policy to the uncertain channel case. The second is a variant that improves delay performance under light loads. The stability region of the system is characterised and the first policy is argued to be throughput optimal. A recently proposed policy of Kar et al [1] has lesser complexity, but is shown to be throughput suboptimal. Further, simulations demonstrate better delay and backlog properties for both our policies at light loads.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Thesis focuses on mutations of POLG1 gene encoding catalytic subunit polγ-α of mitochondrial DNA polymerase gamma holoenzyme (polG) and the association of mutations with different clinical phenotypes. In addition, particular defective mutant variants of the protein were characterized biochemically in vitro. PolG-holoenzyme is the sole DNA polymerase found in mitochondria. It is involved in replication and repair of the mitochondrial genome, mtDNA. Holoenzyme also includes the accessory subunit polγ-β, which is required for the enhanced processivity of polγ-α. Defective polγ-α causes accumulation of secondary mutations on mtDNA, which leads to a defective oxidative phosphorylation system. The clinical consequences of such mutations are variable, affecting nervous system, skeletal muscles, liver and other post-mitotic tissues. The aims of the studies included: 1) Determination of the role of POLG1 mutations in neurological syndromes with features of mitochondrial dysfunction and an unknown molecular cause. 2) Development and set up of diagnostic tests for routine clinical purposes. 3) Biochemical characterization of the functional consequences of the identified polγ-α variants. Studies describe new neurological phenotypes in addition to PEO caused by POLG1 mutations, including parkinsonism, premature amenorrhea, ataxia and Parkinson s disease (PD). POLG1 mutations and polymorphisms are both common and/or potential genetic risk factors at least among the Finnish population. The major findings and applications reported here are: 1) POLG1 mutations cause parkinsonism and premature menopause in PEO families in either a recessive or a dominant manner. 2) A common recessive POLG1 mutations (A467T and W748S) in the homozygous state causes severe adult or juvenile-onset ataxia without muscular symptoms or histological or mtDNA abnormalities in muscles. 3) A common recessive pathogenic change A467T can also cause a mild dominant disease in heterozygote carriers. 4) The A467T variant shows reduced polymerase activity due to defective template binding. 5) Rare polyglutamine tract length variants of POLG1 are significantly enriched in Finnish idiopathic Parkinson s disease patients. 6) Dominant mutations are clearly restricted to the highly conserved polymerase domain motifs, whereas recessive ones are more evenly distributed along the protein. The present results highlight and confirm the new role of mitochondria in parkinsonism/Parkinson s disease and describe a new mitochondrial ataxia. Based on these results, a POLG1 diagnostic routine has been set up in Helsinki University Central Hospital (HUSLAB).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Inherited retinal diseases are the most common cause of vision loss among the working population in Western countries. It is estimated that ~1 of the people worldwide suffer from vision loss due to inherited retinal diseases. The severity of these diseases varies from partial vision loss to total blindness, and at the moment no effective cure exists. To date, nearly 200 mapped loci, including 140 cloned genes for inherited retinal diseases have been identified. By a rough estimation 50% of the retinal dystrophy genes still await discovery. In this thesis we aimed to study the genetic background of two inherited retinal diseases, X-linked cone-rod dystrophy and Åland Island eye disease. X-linked cone-rod dystrophy (CORDX) is characterized by progressive loss of visual function in school age or early adulthood. Affected males show reduced visual acuity, photophobia, myopia, color vision defects, central scotomas, and variable changes in fundus. The disease is genetically heterogeneous and two disease loci, CORDX1 and CORDX2, were known prior to the present thesis work. CORDX1, located on Xp21.1-11.4, is caused by mutations in the RPGR gene. CORDX2 is located on Xq27-28 but the causative gene is still unknown. Åland Island eye disease (AIED), originally described in a family living in Åland Islands, is a congenital retinal disease characterized by decreased visual acuity, fundus hypopigmentation, nystagmus, astigmatism, red color vision defect, myopia, and defective night vision. AIED shares similarities with another retinal disease, congenital stationary night blindness (CSNB2). Mutations in the L-type calcium channel α1F-subunit gene, CACNA1F, are known to cause CSNB2, as well as AIED-like disease. The disease locus of the original AIED family maps to the same genetic interval as the CACNA1F gene, but efforts to reveal CACNA1F mutations in patients of the original AIED family have been unsuccessful. The specific aims of this study were to map the disease gene in a large Finnish family with X-linked cone-rod dystrophy and to identify the disease-causing genes in the patients of the Finnish cone-rod dystrophy family and the original AIED family. With the help of linkage and haplotype analyses, we could localize the disease gene of the Finnish cone-rod dystrophy family to the Xp11.4-Xq13.1 region, and thus establish a new genetic X-linked cone-rod dystrophy locus, CORDX3. Mutation analyses of candidate genes revealed three novel CACNA1F gene mutations: IVS28-1 GCGTC>TGG in CORDX3 patients, a 425 bp deletion, comprising exon 30 and flanking intronic regions in AIED patients, and IVS16+2T>C in an additional Finnish patient with a CSNB2-like phenotype. All three novel mutations altered splice sites of the CACNA1F gene, and resulted in defective pre-mRNA splicing suggesting altered or absent channel function as a disease mechanism. The analyses of CACNA1F mRNA also revealed novel alternative wt splice variants, which may enhance channel diversity or regulate the overall expression level of the channel. The results of our studies may be utilized in genetic counseling of the families, and they provide a basis for studies on the pathogenesis of these diseases. In the future, the knowledge of the genetic defects may be used in the identification of specific therapies for the patients.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Multiple sclerosis (MS) is an immune-mediated demyelinating disorder of the central nervous system (CNS) affecting 0.1-0.2% of Northern European descent population. MS is considered to be a multifactorial disease, both environment and genetics play a role in its pathogenesis. Despite several decades of intense research, the etiological and pathogenic mechanisms underlying MS remain still largely unknown and no curative treatment exists. The genetic architecture underlying MS is complex with multiple genes involved. The strongest and the best characterized predisposing genetic factors for MS are located, as in other immune-mediated diseases, in the major histocompatibility complex (MHC) on chromosome 6. In humans MHC is called human leukocyte antigen (HLA). Alleles of the HLA locus have been found to associate strongly with MS and remained for many years the only consistently replicable genetic associations. However, recently other genes located outside the MHC region have been proposed as strong candidates for susceptibility to MS in several studies. In this thesis a new genetic locus located on chromosome 7q32, interferon regulatory factor 5 (IRF5), was identified in the susceptibility to MS. In particular, we found that common variation of the gene was associated with the disease in three different populations, Spanish, Swedish and Finnish. We also suggested a possible functional role for one of the risk alleles with impact on the expression of the IRF5 locus. Previous studies have pointed out a possible role played by chromosome 2q33 in the susceptibility to MS and other autoimmune disorders. The work described here also investigated the involvement of this chromosomal region in MS predisposition. After the detection of genetic association with 2q33 (article-1), we extended our analysis through fine-scale single nucleotide polymorphism (SNP) mapping to define further the contribution of this genomic area to disease pathogenesis (article-4). We found a trend (p=0.04) for association to MS with an intronic SNP located in the inducible T-cell co-stimulator (ICOS) gene, an important player in the co-stimulatory pathway of the immune system. Expression analysis of ICOS revealed a novel, previously uncharacterized, alternatively spliced isoform, lacking the extracellular domain that is needed for ligand binding. The stability of the newly-identified transcript variant and its subcellular localization were analyzed. These studies indicated that the novel isoform is stable and shows different subcellular localization as compared to full-length ICOS. The novel isoform might have a regulatory function, but further studies are required to elucidate its function. Chromosome 19q13 has been previously suggested as one of the genomic areas involved in MS predisposition. In several populations, suggestive linkage signals between MS predisposition and 19q13 have been obtained. Here, we analysed the role of allelic variation in 19q13 by family based association analysis in 782 MS families collected from Finland. In this dataset, we were not able to detect any statistically significant associations, although several previously suggested markers were included to the analysis. Replication of the previous findings on the basis of linkage disequilibrium between marker allele and disease/risk allele appears notoriously difficult because of limitations such as allelic heterogeneity. Re-sequencing based approaches may be required for elucidating the role of chromosome 19q13 with MS. This thesis has resulted in the identification of a new MS susceptibility locus (IRF5) previously associated with other inflammatory or autoimmune disorders, such as SLE. IRF5 is one of the mediators of interferons biological function. In addition to providing new insight in the possible pathogenetic pathway of the disease, this finding suggests that there might be common mechanisms between different immune-mediated disorders. Furthermore the work presented here has uncovered a novel isoform of ICOS, which may play a role in regulatory mechanisms of ICOS, an important mediator of lymphocyte activation. Further work is required to uncover its functions and possible involvement of the ICOS locus in MS susceptibility.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It has been reported that both OLR1 and PCSK9 genes are related to various vascular diseases such as atherosclerosis, cardiovascular disease, peripheral artery disease and stroke, in particular ischemic stroke. The prevalence of PCSK9 rs505151 and OLR1 rs11053646 variants in ischemic stroke were 0.005 and 0.116, respectively. However, to date, association between OLR1 rs11053646 and PCSK9 rs505151 polymorphisms and the risk of ischemic stroke remains unclear and inconclusive. Therefore, this first meta-analysis was carried out to clarify the presumed influence of genetic polymorphisms on ischemic stroke, by analyzing the complete coverage of all relevant studies. All eligible case-control and cohort studies that met the search term were retrieved in multiple scientific databases. Data of interest such as demographic data and genotyping methods were extracted from each study, and the meta-analysis was performed using RevMan 5.3 and Metafor R 3.2.1. The pooled odd ratios (ORs) and 95% confidence intervals (CIs) were calculated using both fixed- and random-effect models. A total of seven case-control studies encompassing 1897 ischemic stroke cases and 2119 healthy controls were critically evaluated. Pooled results from the genetic models indicated that OLR1 rs11053646 dominant (OR=1.33. 95%CI:1.11-1.58) and co-dominant models (OR=1.24, 95%CI:1.02-1.51) were significantly associated with ischemic stroke. For PCSK9 rs505151 polymorphism, the OR of co-dominant model (OR=1.36, 95%CI:1.01-1.58) was found to be higher among ischemic stroke patients. In conclusion, the current meta-analysis highlighted that variant allele of OLR1 rs11053646 G>C and PCSK9 rs505151 A>G may contribute to the susceptibility risk of ischemic stroke.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.