postgresql - How to get from Postgres database to Hadoop Sequence File? -


i need data postgres database accumulo database. we're hoping using sequence files run map/reduce job this, aren't sure how start. internal technical reasons, need avoid sqoop.

will possible without sqoop? again, i'm not sure start. write java class read records (millions) jdbc , somehow output hdfs sequence file?

thanks input!

p.s. - should have mentioned using delimited file problem we're having now. of our long character fields contain delimiter, , therefore don't parse correctly. field may have tab in it. wanted go postgres straight hdfs without parsing.

you can export data database csv or tab-delimited, or pipe-delimited, or ctrl-a (unicode 0x0001) - delimited files. can copy files hdfs , run simple mapreduce job, maybe consisting of mapper , configured read file format used , output sequence files.

this allow distribute load creating of sequence files between servers of hadoop cluster.

also, likely, not one-time deal. have load data postgres database hdfs on regular basis. able tweak mapreduce job merge new data in.


Comments

Popular posts from this blog

html - How to style widget with post count different than without post count -

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

IIS->Tomcat Redirect: multiple worker with default -