sql - Resources exceeded during query execution error, Google BigQuery -
any ideas how make query return results on google bigquery? i'm getting resources exceeded error... there 2b rows in dataset. i'm trying artist id appears each user_id.
select user_id, artist, count(*) count [legacy20130831.merged_data] d group each user_id, artist order user_id asc, count desc
an equivalent query on public data, throws same error:
select actor, repository_name, count(*) count [githubarchive:github.timeline] d group each actor, repository_name order actor, count desc
compare same query, plus limit on results returned. 1 works (14 seconds me):
select actor, repository_name, count(*) count [githubarchive:github.timeline] d group each actor, repository_name order actor, count desc limit 100
instead of using limit, go through fraction of user_ids. in case, 1/3 works:
select actor, repository_name, count(*) count [githubarchive:github.timeline] d abs(hash(actor) % 3) = 0 group each actor, repository_name
but want "to artist id appears each user_id". let's go further, , that:
select actor, repository_name, count ( select actor, repository_name, count, row_number() on (partition actor order count desc) rank ( select actor, repository_name, count(*) count [githubarchive:github.timeline] d abs(hash(actor) % 10) = 0 group each actor, repository_name )) rank=1
note time used %10, gets me results faster. might wondering "i want results 1 query, not 10".
there 2 things can that:
- unioning partitioned tables (comma in expression union, not join in bigquery).
- if still exceeding resources, might need materialize table. run original query , save result new table. run rank() algorithm on table, instead of on in-memory group.
if willing share dataset me, provide dataset specific advice (a lot depends on cardinality).
Comments
Post a Comment