While setting up Apache Hive, HiveServer2 and Beeline (using vanilla packages instead of some kind of prepackaged Hadoop distribution), I struggled with some permission/user related problems. The error message I got stuck with was something like this:
org.apache.hadoop.security.authorize.AuthorizationException
User: hive is not allowed to impersonate johndoe
While googling around for this, I found some parts of the puzzle, but I didn't really encounter a explanation that connected all the necessary dots to solve the problem for my use case:
- using "simple" Hadoop Authentication, with standard Linux users
- HDFS namenode and datanodes are running as user
hdfs
on a bunch of Hadoop cluster machines, let's call themhadoopXX
(hadoop01
,hadoop02
, ...) - YARN resource manager and node manager re running a user
yarn
on these same machines - HiveServer2 is running as user
hive
on a separate machine, let's call itwork01
- A normal user, e.g.
johndoe
, also working from this separate machinework01
, wants to use Beeline to run a Hive query
After quite a bit of rabbit hole crawling I got it working without the "impersonation" error above, as follows:
-
Allow the
hive
user to be a proxy user so that HiveServer2 (which runs as userhive
) can impersonate other users (e.g.johndoe
). I added this to Hadoop configcore-site.xml
:<property> <name>hadoop.proxyuser.hive.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hive.groups</name> <value>*</value> </property>
-
The proxy user feature is only available for superusers, so also make sure this
hive
user belongs to the Linux user group with the name of the HDFS superuser group (usuallysupergroup
, seedfs.permissions.supergroup
config). - Make sure the linux user
hive
exists and belongs to this superuser group, not only onwork01
, but also on thehadoopXX
machines. Otherwise the HDFS namenode and YARN resource manager won't handle theproxyuser
config properly. - Restart the HDFS namenode and YARN resource manager services,
so the new configs in
core-site.xml
are picked up. - Restart the HiveServer2 service
- If you are running the HiveServer2 service in a
non-managed/non-daemon way from an interactive shell,
it might be necessary to start a new shell/session before restarting the service
so that user
hive
's supergroup membership is picked up as intended.