Publications #

Analysis of Garbage Collection Patterns to Extend Microbenchmarks for Big Data Workloads (2022) #

Peer reviewed Conference

Built an open source tool to analyze garbage collection in the JVM at an object-level granularity.

📜 Paper: Available on ACM Digital Library (or PDF)

💾 Code: GitHub

📺 Presentation: WOSP-C at ACM ICPE 2022 (starts at 2:23:57)

CapStyle - Stylized Image Captioning using Deep Learning Models (2019) #

📜 Paper: PDF, Google Scholar

Par-a-graph: Parallelising PageRank (2021) #

Conference

📜 Paper: PDF

💾 Code: GitHub

📺 Presentation: IEEE CCEM 2021 Student Project Showcase

Architecting and Deploying Optimized GANs with minimal footprint for Fashion Synthesis (2021) #

Conference

📜 Paper: PDF

💻 Live Demo: GANs running entirely in your browser

📺 Presentation: STCAI 2021

Pre-Training Reformer Language Models for Abstractive Summarisation (2020) #

📜 Paper: PDF

Open Source Work #

browser-history #

  • Python library to extract history (and more) from various browsers on various platforms.
  • I started this with the purpose of introducing open-source to my peers in the university. I wanted people to contribute meaningfully to an actual project, beyond just a spelling fix. It was more than successful in achieving that goal.
  • Got over 25 contributors from many different countries. Some were first time contributors.
  • Participated in PyCon India 2020 and 2021 DevSprints, again helping people make their first open-source contributions.

michie #

  • A Rust attribute macro library that adds memoization to a function.
  • Co-authored in a mob programming fashion.

toipe #

  • A terminal-based typing test app written in Rust.

Miscellaneous contributions #

Work Projects #

A few interesting projects I was involved in at my workplaces.

Open Variant data type #

  • Implemented support for reading and processing variant type columns in the e6data query engine.
  • Blog post
  • Place: e6data
  • Involvement: fully owned the project.
  • Impact: first query engine after Spark/Databricks to support querying open variant data. Unlocked more customer use cases.

Distributed hash join #

  • Implemented partitioned, shuffled hash join to allow joins where the build-side table does not fit in memory.
  • Place: e6data
  • Involvement: co-owned the project.
  • Impact: unlocked customer use cases by being able to run more queries than before.

Chaos Genius open-source analytics tool #

Incremental scanning of SaaS applications #

  • Built a system to incrementally scan files in Google Drive, OneDrive, etc.
  • Place: Normalyze
  • Involvement: fully owned the project.
  • Impact: a core part of the product.

Custom workflow system #

  • Built a queue-based workflow system.
  • Generic system but built for processing a large amount of items as a single task.
  • Place: Normalyze
  • Involvement: co-owned the project.
  • Impact: a core part of the product.