How to Create a UDF in Presto-1

By | March 21, 2019

How to Create a UDF in Presto-1

This article is going to talk about how we will create a UDF in Presto. When i started looking for creating an UDF, all resources i found spoke about Maven build. But, the one i created is via a gradle build. This Article will cover the below:

  1. Project Structure for Presto UDF
    1. Gradle File
    2. Java Classes
    3. Sample code for UDF
    4. How to build jar?

The code will talk about two basic encrypt and decrypt functions. Note that this is not the recommended approach for encryption and only a sample to show how to build the UDF.

We will be using : Intellij, Java

Source code: Github

Presto UDF: Initial Setup

Before you start building the UDF, please read the official documentation. That would be very helpful for you to understand the different annotations used while defining the functions.

Presto Functions Documentation

Create a new gradle project in Intellij, below is the structure of my project.

Update 4/29/2019:

With Presto foundation being established, and the code base moving to a different github repo. I have updated all the dependencies accordingly in my project which you can find on github.

Move from com.facebook.presto.spi.Plugin to io.prestosql.spi.Plugin.

Project Structure in IntelliJ

Gradle file

All the examples that I found on the internet were having pom.xml , where we package the entire code as presto plugin. I tried this with Gradle. Below is the gradle file which also lists out the basic modules required to start to create an UDF.

All the dependencies mentioned as ‘implementation’ will be part of the built jar and are required for the UDF to work.

READ  List all files in AWS S3 using python boto

plugins {
id 'java'
}

group 'com.prestoudf'
version '1.0'

sourceCompatibility = 1.8
targetCompatibility = 1.8

ext {
prestoVersion = '0.213'
airliftUnitsVersion = '1.3'
airliftSliceVersion = '0.36'
guavaVersion = '27.0.1-jre'
/*Additional Dependencies required for UDF*/
awsSdkCoreVersion = '1.11.507'
awsSdkKmsVersion = '1.11.507'
awsSdkVersion = '1.3.1'
jasyptVersion = '1.9.2'
}

repositories {
mavenCentral()
}

dependencies {
testCompile group: 'junit', name: 'junit', version: '4.12'
implementation "com.google.guava:guava:$guavaVersion" //guava is required for the UDF to work
implementation "com.amazonaws:aws-java-sdk-core:$awsSdkCoreVersion"
implementation "com.amazonaws:aws-java-sdk-kms:$awsSdkKmsVersion"
implementation "com.amazonaws:aws-encryption-sdk-java:$awsSdkVersion"
implementation "org.jasypt:jasypt:$jasyptVersion"
compileOnly "io.prestosql:presto-main:$prestoVersion"
compileOnly "io.prestosql:presto-spi:$prestoVersion"
compileOnly "io.airlift:slice:$airliftSliceVersion"
compileOnly "io.airlift:units:$airliftUnitsVersion"
testImplementation "io.prestosql:presto-tests:$prestoVersion"

}

jar {
from sourceSets.main.output
dependsOn configurations.runtimeClasspath
from {
configurations.runtimeClasspath.findAll { it.name.endsWith('jar') }.collect { zipTree(it) }
}
exclude 'META-INF/*.RSA', 'META-INF/*.SF','META-INF/*.DSA'
} 

Write the UDF

In my project, I have created functions for encrypting and decrypting. Those are for example only and not recommended approaches for actual encryption of the data in presto. I will take consider one function and explain to the best of my understanding.


@Description("Encrypts a string using cipher")
@ScalarFunction("encrypt")
@SqlType(StandardTypes.VARCHAR)
public static Slice encryptString(@SqlType(StandardTypes.VARCHAR) Slice privateData) {
basicTextEncryptor.setPasswordCharArray(secret.toCharArray());
return utf8Slice(basicTextEncryptor.encrypt(privateData.toStringUtf8()));} 

Description → What is the purpose of the function?

ScalarFunction → Actual function name that will be used while querying from Presto

SqlType → Return type of the Function.

The above method takes a string as input and encrypts it using a cipher text. Notice that String is mapped to Slice in presto.

Bonus: Aws KMS Decryption

As a bonus i have also added a function for decrypting data which are encrypted using AWS KMS. This function might need to be tweaked based on the approach taken for aws kms encryption. All the code for this project is available on github.

We can check user id and restrict access for certain UDF. Find the example in my code.. That’s one of the option if Apache Ranger is not a feasible solution.

Run gradle clean build and use the built jar to install the UDF on the Presto server.

Creating the io.prestosql.spi.Plugin file under META-INF/services is a mandatory Step and if missed, we cannot install the UDF on the Presto server.

Issues faced:

  • Multiple errors when UDF was being installed. (Built incorrect jar).
  • Ability to test the Function locally.

Watch out:

In my next post, I will talk about testing these UDFs locally. Every time these needs to be installed, the presto server needs a restart. So better to test locally, before deployment.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.