While setting up Apache Hive, HiveServer2 and Beeline (using vanilla packages instead of some kind of prepackaged Hadoop distribution), I struggled with some permission/user related problems. The error message I got stuck with was something like this:

org.apache.hadoop.security.authorize.AuthorizationException
User: hive is not allowed to impersonate johndoe

While googling around for this, I found some parts of the puzzle, but I didn't really encounter a explanation that connected all the necessary dots to solve the problem for my use case:

  • using "simple" Hadoop Authentication, with standard Linux users
  • HDFS namenode and datanodes are running as user hdfs on a bunch of Hadoop cluster machines, let's call them hadoopXX (hadoop01, hadoop02, ...)
  • YARN resource manager and node manager re running a user yarn on these same machines
  • HiveServer2 is running as user hive on a separate machine, let's call it work01
  • A normal user, e.g. johndoe, also working from this separate machine work01, wants to use Beeline to run a Hive query

After quite a bit of rabbit hole crawling I got it working without the "impersonation" error above, as follows:

  • Allow the hive user to be a proxy user so that HiveServer2 (which runs as user hive) can impersonate other users (e.g. johndoe). I added this to Hadoop config core-site.xml:

    <property>
        <name>hadoop.proxyuser.hive.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hive.groups</name>
        <value>*</value>
    </property>
    
  • The proxy user feature is only available for superusers, so also make sure this hive user belongs to the Linux user group with the name of the HDFS superuser group (usually supergroup, see dfs.permissions.supergroup config).

  • Make sure the linux user hive exists and belongs to this superuser group, not only on work01, but also on the hadoopXX machines. Otherwise the HDFS namenode and YARN resource manager won't handle the proxyuser config properly.
  • Restart the HDFS namenode and YARN resource manager services, so the new configs in core-site.xml are picked up.
  • Restart the HiveServer2 service
  • If you are running the HiveServer2 service in a non-managed/non-daemon way from an interactive shell, it might be necessary to start a new shell/session before restarting the service so that user hive's supergroup membership is picked up as intended.