High-throughput genome sequencing has enabled the ability to sequence large numbers of bacterial genomes to unravel the dynamics of bacterial populations at an unprecedented resolution. In its purist form, this approach provides an assessment of the naturally occurring antigenic diversity within a population. We are using this technology to investigate the population structure of the human bacterial pathogen, the Group A Streptococcus (GAS), a leading cause of human morbidity and mortality to which no vaccine exists. One of the major hurdles facing GAS vaccine design is variable antigen carriage and antigenic heterogeneity. In order to advance the progress of the global GAS vaccine, we have analyzed the genome sequences of over 1500 GAS isolates, primarily from regions endemic for streptococcal infection. This GAS genome database comprises of over 140 emm-sequence types, 37 emm-clusters and 402 multi-locus sequence types. We assess antigen carriage of 28 purported GAS vaccine antigens, polymorphism heterogeneity and provide examples of variation in the context of protein structure. The development of genomic databases for vaccine design is equally applicable to future antigenic, epidemiological and pathogenesis studies at a global level.