Learnings from my last assignment

By | December 29, 2019
Learnings from my last assignment

The year 2019 as well as my previous work assignment, has been very interesting, challenging and educating. I happen to work on different technologies and domains.
Here I am jotting down the learnings from my last assignment. While I learnt many things, one of the most important lesson to me goes in sync with a great man’s words “All I know is, Nothing“.

Standards

  • Importance of Coding and Logging standards for an organization or at-least a team. By defining those standards upfront, the effort saved as the application grows bigger was quite evident to me.
  • Importance of defining metrics and having a clear distinction between application and business metrics.
  • Identifying standards for publishing events and consuming them.
  • Identifying data storage formats, I worked mainly on Apache Avro. The advantages it brings when there is a Schema Evolution.
  • Importance Encryption and Tokenization standards for the data we deal with. You know privacy is no more a wishlist, but a law. It also made me understand difference between the two terms used here.

Apache Avro

  • Importance of defining a schema/model for the data we deal with and how Avro enforces certain data quality checks in the pipeline.
  • I also understood what schema evolution is and how its needs to be a planned move. Enforcing schema compatibility checks before changing them.
  • Avro IDL makes it super easy to define/design schemas for the data we deal with.
  • Challenges involved with Union and complex union types.
  • Impacts of breaking schema changes on production systems and probable solutions to handle that.
  • Defining our custom Logical type as part of Avro. Example : Encryption and Tokenization.
READ  How to Create a UDF in Presto-1

Kafka

  • I learnt a lot about how Kafka can fit into some of the applications, especially when its event based.
  • Difference between System and Business events.
  • Leveraging Schema registry to enforce checks while producing and consuming messages from topic.
  • Kafka headers and how that can be leveraged in the pipeline.
  • Producing data with multiple schemas versus single schema to a single topic.
  • Importance of metadata (Data about data).
  • Kafka Connect and use cases around it.
  • A little bit about Kafka Streams.

Spark

  • Learnt Scala (just enough for Spark), build and execute jobs by leveraging Livy.
  • Understand about partitioning, re-partitioning, data shuffles.
  • Consuming messages from Kafka in batch mode.
  • Learnt few things about the executor, executor cores and memory management in Spark.
  • Spark History, Zeppelin notebooks for Spark.
  • Unit testing in Spark and its importance.

Presto/Hive

Airflow

  • Writing workflow as code and understanding of how Airflow works.
  • Creating custom Operators.

Docker and Kubernetes

  • How docker can help us have $0 infrastructure cost during development and Unit testing.
  • Kubernetes is still a partially known area. I learnt about accessing the pods, managing secrets, YAML files.

Design Patterns – Java

  • From reading about Design patterns to actually to see it being used.
  • Importance of Unit Testing and Code Reviews.

Domains

I have listed some of the domains that I have worked on. If one cannot traverse across domains for a specific Customer, then we are hardly making use of the richness of the data.

  • Customer – I was able to understand the different challenges a company will have in dealing with Customer data.
  • Sales – All the different attributes related to a transaction and why its critical to have them available in the system Near Real time.
  • Preference – For people who work under marketing, the preference of customer plays a great role and with more laws around them, its important to have them updated and be available to the marketing teams.
  • Loyalty – The success of this program can only be measured when the company can leverage this data to its benefit.
READ  How to view Spark History logs locally

This is a brain dump of the previous assignment I worked on but it will also serve as a reminder of all the learnings as well as the unknowns. I also plan to write other posts related to some of the topics listed here as I truly believe in “To teach is to learn twice“.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.