How to Create a UDF in Presto-1
This article is going to talk about how we will create a UDF in Presto. When i started looking for creating an UDF, all resources i found spoke about Maven build. But, the one i created is via a gradle build. This Article will cover the below:
- Project Structure for Presto UDF
- Gradle File
- Java Classes
- Sample code for UDF
- How to build jar?
The code will talk about two basic encrypt and decrypt functions. Note that this is not the recommended approach for encryption and only a sample to show how to build the UDF.
We will be using : Intellij, Java
Source code: Github
Presto UDF: Initial Setup
Before you start building the UDF, please read the official documentation. That would be very helpful for you to understand the different annotations used while defining the functions.
Presto Functions Documentation
Create a new gradle project in Intellij, below is the structure of my project.
Update 4/29/2019:
With Presto foundation being established, and the code base moving to a different github repo. I have updated all the dependencies accordingly in my project which you can find on github.
Move from com.facebook.presto.spi.Plugin to io.prestosql.spi.Plugin.
Project Structure in IntelliJ
Gradle file
All the examples that I found on the internet were having pom.xml , where we package the entire code as presto plugin. I tried this with Gradle. Below is the gradle file which also lists out the basic modules required to start to create an UDF.
All the dependencies mentioned as ‘implementation’ will be part of the built jar and are required for the UDF to work.
plugins { id 'java' } group 'com.prestoudf' version '1.0' sourceCompatibility = 1.8 targetCompatibility = 1.8 ext { prestoVersion = '0.213' airliftUnitsVersion = '1.3' airliftSliceVersion = '0.36' guavaVersion = '27.0.1-jre' /*Additional Dependencies required for UDF*/ awsSdkCoreVersion = '1.11.507' awsSdkKmsVersion = '1.11.507' awsSdkVersion = '1.3.1' jasyptVersion = '1.9.2' } repositories { mavenCentral() } dependencies { testCompile group: 'junit', name: 'junit', version: '4.12' implementation "com.google.guava:guava:$guavaVersion" //guava is required for the UDF to work implementation "com.amazonaws:aws-java-sdk-core:$awsSdkCoreVersion" implementation "com.amazonaws:aws-java-sdk-kms:$awsSdkKmsVersion" implementation "com.amazonaws:aws-encryption-sdk-java:$awsSdkVersion" implementation "org.jasypt:jasypt:$jasyptVersion" compileOnly "io.prestosql:presto-main:$prestoVersion" compileOnly "io.prestosql:presto-spi:$prestoVersion" compileOnly "io.airlift:slice:$airliftSliceVersion" compileOnly "io.airlift:units:$airliftUnitsVersion" testImplementation "io.prestosql:presto-tests:$prestoVersion" } jar { from sourceSets.main.output dependsOn configurations.runtimeClasspath from { configurations.runtimeClasspath.findAll { it.name.endsWith('jar') }.collect { zipTree(it) } } exclude 'META-INF/*.RSA', 'META-INF/*.SF','META-INF/*.DSA' } |
Write the UDF
In my project, I have created functions for encrypting and decrypting. Those are for example only and not recommended approaches for actual encryption of the data in presto. I will take consider one function and explain to the best of my understanding.
@Description("Encrypts a string using cipher") @ScalarFunction("encrypt") @SqlType(StandardTypes.VARCHAR) public static Slice encryptString(@SqlType(StandardTypes.VARCHAR) Slice privateData) { basicTextEncryptor.setPasswordCharArray(secret.toCharArray()); return utf8Slice(basicTextEncryptor.encrypt(privateData.toStringUtf8()));} |
Description → What is the purpose of the function?
ScalarFunction → Actual function name that will be used while querying from Presto
SqlType → Return type of the Function.
The above method takes a string as input and encrypts it using a cipher text. Notice that String is mapped to Slice in presto.
Bonus: Aws KMS Decryption
As a bonus i have also added a function for decrypting data which are encrypted using AWS KMS. This function might need to be tweaked based on the approach taken for aws kms encryption. All the code for this project is available on github.
We can check user id and restrict access for certain UDF. Find the example in my code.. That’s one of the option if Apache Ranger is not a feasible solution.
Run gradle clean build and use the built jar to install the UDF on the Presto server.
Creating the io.prestosql.spi.Plugin file under META-INF/services is a mandatory Step and if missed, we cannot install the UDF on the Presto server.
Issues faced:
- Multiple errors when UDF was being installed. (Built incorrect jar).
- Ability to test the Function locally.
Watch out:
In my next post, I will talk about testing these UDFs locally. Every time these needs to be installed, the presto server needs a restart. So better to test locally, before deployment.
could you please implement Aggregation Function using it? because your way of explaining is quite simple and easy
Hi Can you please gimme an example and what you want to achieve in the UDF?
could you give me your plugin I just want to place that in my presto folder and want to check … could you please just guide me where to put the .jar file
When you run Gradle clean build you should be able to get the jar file. Please share your presto installation structure and how you are starting the server