My Data Engineer Interview Experience at JPMorgan & Chase

JPMorgan Chase & Co. is an American multinational financial services firm headquartered in New York City and incorporated in Delaware. Here ismy step by step interview experience for the sa

Blog

I recently interviewed for a Senior Data Engineer (Data Engineer III) position at JP Morgan Chase. This role would have me working on big data systems and pipelines across their offices in Scotland and Bengaluru.

In this blog post, I'm sharing a comprehensive overview of JP Morgan's Data Engineer interview process and the types of technical questions they asked at each stage.

Hopefully, this gives some insight for others interested in data roles there!

For data engineering roles like this one, JP Morgan has a quite rigorous interview procedure with four rounds before they make final hiring decisions.

Here's a high-level look:

Round 1: Screening Call

Round 2: Core Java, Concurrency, System Design

Round 3: SQL, Python, AWS

Round 4: Onsite Interviews, Coding Challenges, System Design, Technology Discussion with Engineering Teams

Clearly, they want to evaluate candidates from multiple angles before bringing someone on board in a critical data infrastructure role!

Next, I'll dive into more specifics on the first three rounds I went through so far.

Round 1: Preliminary Screening Call

My first step was a 30-minute phone screening with a VP-level manager in my potential organisation. Here are some details:

📌Discussed my previous work experience with cloud platforms, data pipelines, databases, and programming

📌Asked questions about why I was interested in JP Morgan and my aspirations for the role

📌Some technical Java questions around versions, wrappers classes, and data types

📌Discussed how I've used infrastructure as code tools like Terraform

📌Explained my experience with continuous integration pipelines (Jenkins)

I appreciated them taking the time to assess my motivations and background at a high level in Round 1 before diving deeper technically.

Round 2: Core Java, Multi-threading, Architecture

Next, I had a longer technical interview which lasted around 1 hour and 45 minutes.

This round focused on Java concepts, Java core basics, Terraform, Jenkins, Java Concurrency & Big Data Concepts Spark.

Terraform, Jenkins and Deployments

We started by discussing infrastructure as code practices, here are the questions asked:

📌 Write Terraform configurations for configuring the Ec2 machine

📌 Explain the Terraform lifecycle for deploying a new cluster on AWS. I explained this with the example of CloudFormation Stack AWS Service for deployment of presto cluster

📌 Deploying code on the production environment. I explained this by taking the example

Deploy code on GitHub -> webhook integration GitHub with Jenkins -> jenkin job will get triggered based on changes on git -> jobs runs -> code deploy on prod

Java Concepts

We transitioned into a series of Java questions:

📌 In which Java version did lambda expressions get introduced?

📌 What is the default value for the float primitive type and Float wrapper class in Java?

📌 Write a Singleton class implementation in Notepad. I wrote code using Class & Object with Singelton logic design patterns

📌 Explain Java's HashMap implementation as a data structure

📌 What happens if we don’t override the Thread class run() method?

📌 Explain the difference between stubs and skeletons in Remote Method Invocation (RMI)

📌 Write example code using Java concurrent API with usingforEach(), forEachEntry(), forEachKey()

Hadoop, Map Reduce & Spark

Finally, we explored some bigger data engineering concepts:

📌 Discuss scenarios where you have applied Bloom Filters in Spark-based projects and the benefits they provided. I explained this by using the example -suppose I have to build a spell-checker application for a large corpus of words. I want to efficiently check if a word is spelt correctly without storing all the words in memory.

📌 Can you describe some techniques for optimising Spark jobs for better performance and resource management?

📌 Explain the concept of data locality in Hadoop, and why is it important for performance.

📌 Some questions on Databricks Job Cluster, SQL Endpoint (How is Photon working in SQL Warehouse?).

📌 How you can calculate the cost for the Databricks cluster? I was asked this because I mentioned Databricks in my resume. I explained by taking the example of DBU.

📌 Imagine you have a Spark job that involves working with custom data types that are not supported out-of-the-box by Spark. How would you handle these custom data types in Spark?

📌 How to control the number of mappers in map-reduce jobs?

📌 Suppose you have two dataframe df1 and df2, both have the below columns

df1 => id , name,

df2 => id , country, address, city, count

Suppose you want to join two data frames and then apply a filter where the country is Singapore.

Then pivot the output dataframe.

Finally, you have to showcase the city of Singapore first whose count of people is highest on top.

I used window functions to explain this.

If you're preparing for data engineering interviews, you can use this Free AI Mentor Chrome extension to practice SQL, Java, Concurrency, System Design, and more.

It stays on the screen, guiding you step by step when you're stuck, breaking down problems, and helping you build a structured approach to solving them. ✅

Round 3: SQL, Python, AWS and General CS

My final round so far was ~90 minutes of writing code:

📌 Write an SQL query to find the consecutive number, which appears at least thrice one after another without interruption.

📌 Write an SQL query to find

+---------------+---------+| Column Name | Type |+---------------+---------+| employee_id | int || employee_name | varchar || manager_id | int |+---------------+---------+employee_id is the primary key for this table.

Each row of this table indicates that the employee with ID employee_id and name employee_name reports his work to his/her direct manager with manager_id

The head of the company is the employee with employee_id = 1

Some questions on Python Data Structures, File handling, Decorators & INIT

📌 Write a Python program to replace all characters in a list except a given character

📌 Two strings are said to be complete if, on concatenation, they contain all 26 English alphabets. For example, “abcdefghi” and “jklmnopqrstuvwxyz” are complete as they together have all characters from ‘a’ to ‘z’. We are given two sets of sizes n and m respectively and we need to find the number of pairs that are complete on concatenating each string from set 1 to each string from set 2.

📌 Read data from 3 files into Pandas DataFrame

Remove some columns

Filter rows based on indexes

Search for strings in the DataFrame

One question on Tree Data Structure

📌 Convert a Binary Search Tree into a Skewed tree in increasing or decreasing order

You can refer to this here: https://www.geeksforgeeks.org/convert-a-binary-search-tree-into-a-skewed-tree-in-increasing-or-decreasing-order/)

Focus on Data Structures Array, String, List, Queue, Stack, Tree if you are going for Data Structure Round

Round 4: Techno-Managerial Interview

A 45-minute VP round. The interview started with my introduction, my expertise, tech skillset that I had worked on.

📌 Explained real-time processing using Spark Streaming and Structured Streaming

📌 What is the difference between Spark Streaming vs Structured Streaming? I explained this with a real-life example.

📌 The recruiter asked me how you are consuming data & what approaches we used to consume data in e-commerce applications. I explained how we can expose API & capture event data then send data to Mixpanel and then to Kafka using Kafka streams. Then we can run batch pipelines such as ETL pipelines which are running on Databricks (using custom jobs) then we can dump the data into Delta Lake.

📌 Asked to map the end-to-end flow of the open-source Datahub Spline project showing how Spark listeners capture upstream and downstream table metadata. I drew it out on a notepad.

📌 Asked what conditions I would take care of when ingesting lots of Kafka event data and RDBMS data, and how to design an ETL pipeline to capture both streams in near real-time. I explained using Kafka Connect for RDBMS change data capture.

📌 Does Delta Lake support upserts?

📌 How to optimise AWS costs? I was given a scenario of unused S3 data, and many on-demand instances running. I mapped out S3 tiering, switching to spots vs on-demand when possible, and restricting resources through IAM policies.

He was very happy with my answers. At last, I asked what tools & technologies are you using in your team.

Round 5: Executive Director Interview

This interview was taken by the Executive Director of Jp Morgan Chase & Co. This round lasted for about 30 minutes.

I introduced myself and discussed projects like Databricks and my data engineering experience at my previous company.

Also explained my day-to-day responsibilities using Python, SQL, and Databricks.

Questions faced:

📌 Explain the Kubernetes cluster used in your projects

📌 What AWS services have you used? I explained the Compute Instances, S3 Bucket, Presto, EMR, Glue, Redshift, Load balancer, RDS

📌 Explain the difference between Spark and Hadoop

📌 Steps to migrate on-prem data and ETL jobs to AWS cloud

Scenario-based questions:

📌 A business user asks you to add a data element to the reporting output that you know provides no value and will slow down performance. What would you do?

📌 You're assigned 3 tasks in a sprint but suddenly get a high-priority new task. How do you manage the workload and update your manager?

The final question included if I could relocate for this hybrid role.

On the next day, I got positive feedback from HR. Fortunately, I got selected for the Senior Data Engineer (Data Engineer-3) position at Jp Morgan Chase & Co. But that time I decided to choose an offer from Walmart for my next career opportunity. 😊

Connect with me for 1:1 personalised mentorship.

Table of Contents

Nishchay Agrawal

Software Development Engineer-III (Data Engineer-3)

Walmart

5 yrs of Exp. at

Walmart|Meesho

"I have great experience working as an Individual Contributor and delivering on a high level. As a mentor, I can help you to take your career to the next level wheth...read more

2x Sessions

Unlimited Chat

Job Referrals

Spark

Big Data

Data Warehousing

Python

SQL

Leadership & Communication

Teamwork

Docker

₹10,000

Per Month

5.0(200+)

View Profile

For:

Experienced Professional

Fresher

Targeting Domains:

Data Engineer / Big ..

Get started by booking a free trial session with the mentor of your choice.

[email protected]

Preplaced Education Private Limited

Ibblur Village, Bangalore - 560103

GSTIN- 29AAKCP9555E1ZV