postgresql - How to get from Postgres database to Hadoop Sequence File? -


i need data postgres database accumulo database. we're hoping using sequence files run map/reduce job this, aren't sure how start. internal technical reasons, need avoid sqoop.

will possible without sqoop? again, i'm not sure start. write java class read records (millions) jdbc , somehow output hdfs sequence file?

thanks input!

p.s. - should have mentioned using delimited file problem we're having now. of our long character fields contain delimiter, , therefore don't parse correctly. field may have tab in it. wanted go postgres straight hdfs without parsing.

you can export data database csv or tab-delimited, or pipe-delimited, or ctrl-a (unicode 0x0001) - delimited files. can copy files hdfs , run simple mapreduce job, maybe consisting of mapper , configured read file format used , output sequence files.

this allow distribute load creating of sequence files between servers of hadoop cluster.

also, likely, not one-time deal. have load data postgres database hdfs on regular basis. able tweak mapreduce job merge new data in.


Comments

Popular posts from this blog

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

html - How to style widget with post count different than without post count -

url rewriting - How to redirect a http POST with urlrewritefilter -