postgresql - How to get from Postgres database to Hadoop Sequence File? -
i need data postgres database accumulo database. we're hoping using sequence files run map/reduce job this, aren't sure how start. internal technical reasons, need avoid sqoop.
will possible without sqoop? again, i'm not sure start. write java class read records (millions) jdbc , somehow output hdfs sequence file?
thanks input!
p.s. - should have mentioned using delimited file problem we're having now. of our long character fields contain delimiter, , therefore don't parse correctly. field may have tab in it. wanted go postgres straight hdfs without parsing.
you can export data database csv or tab-delimited, or pipe-delimited, or ctrl-a (unicode 0x0001) - delimited files. can copy files hdfs , run simple mapreduce job, maybe consisting of mapper , configured read file format used , output sequence files.
this allow distribute load creating of sequence files between servers of hadoop cluster.
also, likely, not one-time deal. have load data postgres database hdfs on regular basis. able tweak mapreduce job merge new data in.
Comments
Post a Comment