Sqoop

In this post you will get familiar with some more advanced sqoop options. We will be discussing very specific usecase of Postgres database and schema containing UUID and WKB types.

These options are generic enough so stay tuned and keep reading!

You might find them useful even if you working with different database or some other data types.

In our scenario we have a table definition:

Nothing fancy here but let’s se what happen when we want to take data from HDFS and export it to Postgres. We run this sqoop command from inside oozie:

We use minimal set of parameters just to specify that we want to export data:

  • from /user/mszymaniuk/input directory in HDFS
  • to postgres_table in testdb database which is on <postgres_host> machine

Seems reasonable right? But we end up with such set of errors:

So basically Sqoop is not aware of datatypes like UUID or WKB.

Let’s just tell Sqoop to treat all data as String

The command below is slighly modified version of original command. We basically telling sqoop to interpret __id column and wkb column as Strings.

What happens when we run it? It looks promising…:

…But eventually it fails. With postgres exception this time:

Simply speaking postgres is complaining about mismatch in a data type.

Let postgres decide about datatypes.

We need to add one more thing: We need to tell JDBC driver to send all strings untyped so Postgres can decide about the right type. We do that by passing ?stringtype=unspecified to postgres connection string. The final version of the config would look like this:

I just want to stress out that even if you are using other databases and you need to support types not explicitly supported by Sqoop it’s very likely that this technique would help you.