google bigquery - Dealing with evolving schemas -
we gaming company stores events (up 1 giga events per day) bigquery. events sharded on month , application in order lower query costs.
now our problem.
our current solution supports adding new type of events leads new versions of table schema. versions has been added tables.
i.e. events_app1_v2_201308 , events_app1_v2_201308
if add events new column types in september events_app1_v3_201309
we have written code finds out involved tables (for date range) , makes union of them a'la bigquery's comma separeted clause.
but realised not work when make unions on different versions of event tables.
anyone has smart solution of how deal this!?
right investigating if json structures us. current solution flat columns. [timestamp, eventid, value, value, value, ...]
from https://developers.google.com/bigquery/query-reference#from
note: unlike many other sql-based systems, bigquery uses comma syntax indicate table unions, not joins. means can run query on several tables with compatible !? schemas follows:
you should able modify table schema of old tables add columns, union should match. note can add columns, not remove them. can use tables.patch() method this, or bq update --schema
moreover, long new fields aren't marked required, should considered compatible. if not case, however, bug -- let know if you're experiencing.
Comments
Post a Comment