My Data Engineer Interview Experience at JPMorgan & Chase

JPMorgan Chase & Co. is an American multinational financial services firm headquartered in New York City and incorporated in Delaware. Here ismy step by step interview experience for the sa

Mentor

Blog

I recently interviewed for a Senior Data Engineer (Data Engineer III) position at JP Morgan Chase. This role would have me working on big data systems and pipelines across their offices in Scotland and Bengaluru.

In this blog post, I'm sharing a comprehensive overview of JP Morgan's Data Engineer interview process and the types of technical questions they asked at each stage. 

Hopefully, this gives some insight for others interested in data roles there!

For data engineering roles like this one, JP Morgan has a quite rigorous interview procedure with four rounds before they make final hiring decisions.

Here's a high-level look:

  • Round 1: Screening Call
    • Round 2: Core Java, Concurrency, System Design
      • Round 3: SQL, Python, AWS
        • Round 4: Onsite Interviews, Coding Challenges, System Design, Technology Discussion with Engineering Teams

          Clearly, they want to evaluate candidates from multiple angles before bringing someone on board in a critical data infrastructure role!

          Next, I'll dive into more specifics on the first three rounds I went through so far.

          Round 1: Preliminary Screening Call

          My first step was a 30-minute phone screening with a VP-level manager in my potential organisation. Here are some details:

          ๐Ÿ“ŒDiscussed my previous work experience with cloud platforms, data pipelines, databases, and programming

          ๐Ÿ“ŒAsked questions about why I was interested in JP Morgan and my aspirations for the role

          ๐Ÿ“ŒSome technical Java questions around versions, wrappers classes, and data types

          ๐Ÿ“ŒDiscussed how I've used infrastructure as code tools like Terraform

          ๐Ÿ“ŒExplained my experience with continuous integration pipelines (Jenkins)

          I appreciated them taking the time to assess my motivations and background at a high level in Round 1 before diving deeper technically.

          Round 2: Core Java, Multi-threading, Architecture

          Next, I had a longer technical interview which lasted around 1 hour and 45 minutes.

          This round focused on Java concepts, Java core basics, Terraform, Jenkins, Java Concurrency & Big Data Concepts Spark.

          Terraform, Jenkins and Deployments

          We started by discussing infrastructure as code practices, here are the questions asked:

          ๐Ÿ“Œ Write Terraform configurations for configuring the Ec2 machine

          ๐Ÿ“Œ Explain the Terraform lifecycle for deploying a new cluster on AWS. I explained this with the example of CloudFormation Stack AWS Service for deployment of presto cluster

          ๐Ÿ“Œ Deploying code on the production environment. I explained this by taking the example

          • Deploy code on GitHub -> webhook integration GitHub with Jenkins -> jenkin job will get triggered based on changes on git -> jobs runs -> code deploy on prod

            Java Concepts

            We transitioned into a series of Java questions:

            ๐Ÿ“Œ In which Java version did lambda expressions get introduced?

            ๐Ÿ“Œ What is the default value for the float primitive type and Float wrapper class in Java?

            ๐Ÿ“Œ Write a Singleton class implementation in Notepad. I wrote code using Class & Object with Singelton logic design patterns

            ๐Ÿ“Œ Explain Java's HashMap implementation as a data structure

            ๐Ÿ“Œ What happens if we donโ€™t override the Thread class run() method?

            ๐Ÿ“Œ Explain the difference between stubs and skeletons in Remote Method Invocation (RMI)

            ๐Ÿ“Œ Write example code using Java concurrent API with usingforEach(), forEachEntry(), forEachKey()

            Hadoop, Map Reduce & Spark

            Finally, we explored some bigger data engineering concepts:

            ๐Ÿ“Œ Discuss scenarios where you have applied Bloom Filters in Spark-based projects and the benefits they provided. I explained this by using the example -suppose I have to build a spell-checker application for a large corpus of words. I want to efficiently check if a word is spelt correctly without storing all the words in memory.

            ๐Ÿ“Œ Can you describe some techniques for optimising Spark jobs for better performance and resource management?

            ๐Ÿ“Œ Explain the concept of data locality in Hadoop, and why is it important for performance.

            ๐Ÿ“Œ Some questions on Databricks Job Cluster, SQL Endpoint (How is Photon working in SQL Warehouse?).

            ๐Ÿ“Œ How you can calculate the cost for the Databricks cluster? I was asked this because I mentioned Databricks in my resume. I explained by taking the example of DBU.

            ๐Ÿ“Œ Imagine you have a Spark job that involves working with custom data types that are not supported out-of-the-box by Spark. How would you handle these custom data types in Spark?

            ๐Ÿ“Œ How to control the number of mappers in map-reduce jobs?

            ๐Ÿ“Œ Suppose you have two dataframe df1 and df2, both have the below columns

            • df1 => id , name,
              • df2 => id , country, address, city, count
                • Suppose you want to join two data frames and then apply a filter where the country is Singapore.
                  • Then pivot the output dataframe.
                    • Finally, you have to showcase the city of Singapore first whose count of people is highest on top.

                      I used window functions to explain this.

                      If you're preparing for data engineering interviews, you can use this Free AI Mentor Chrome extension to practice SQL, Java, Concurrency, System Design, and more.

                      It stays on the screen, guiding you step by step when you're stuck, breaking down problems, and helping you build a structured approach to solving them. โœ…

                      Round 3: SQL, Python, AWS and General CS

                      My final round so far was ~90 minutes of writing code:

                      ๐Ÿ“Œ Write an SQL query to find the consecutive number, which appears at least thrice one after another without interruption.

                      ๐Ÿ“Œ Write an SQL query to find 

                      +---------------+---------+| Column Name | Type |+---------------+---------+| employee_id | int || employee_name | varchar || manager_id | int |+---------------+---------+employee_id is the primary key for this table.

                      Each row of this table indicates that the employee with ID employee_id and name employee_name reports his work to his/her direct manager with manager_id

                      The head of the company is the employee with employee_id = 1

                      Some questions on Python Data Structures, File handling, Decorators & INIT

                      ๐Ÿ“Œ Write a Python program to replace all characters in a list except a given character

                      ๐Ÿ“Œ Two strings are said to be complete if, on concatenation, they contain all 26 English alphabets. For example, โ€œabcdefghiโ€ and โ€œjklmnopqrstuvwxyzโ€ are complete as they together have all characters from โ€˜aโ€™ to โ€˜zโ€™. We are given two sets of sizes n and m respectively and we need to find the number of pairs that are complete on concatenating each string from set 1 to each string from set 2.

                      ๐Ÿ“Œ Read data from 3 files into Pandas DataFrame

                      • Remove some columns
                        • Filter rows based on indexes
                          • Search for strings in the DataFrame

                            One question on Tree Data Structure

                            ๐Ÿ“Œ Convert a Binary Search Tree into a Skewed tree in increasing or decreasing order

                            You can refer to this here: https://www.geeksforgeeks.org/convert-a-binary-search-tree-into-a-skewed-tree-in-increasing-or-decreasing-order/)

                            Focus on Data Structures Array, String, List, Queue, Stack, Tree if you are going for Data Structure Round

                            Round 4: Techno-Managerial Interview

                            A 45-minute VP round. The interview started with my introduction, my expertise, tech skillset that I had worked on. 

                            ๐Ÿ“Œ Explained real-time processing using Spark Streaming and Structured Streaming

                            ๐Ÿ“Œ What is the difference between Spark Streaming vs Structured Streaming?  I explained this with a real-life example.

                            ๐Ÿ“Œ The recruiter asked me how you are consuming data & what approaches we used to consume data in e-commerce applications. I explained how we can expose API & capture event data then send data to Mixpanel and then to Kafka using Kafka streams. Then we can run batch pipelines such as ETL pipelines which are running on Databricks (using custom jobs) then we can dump the data into Delta Lake.

                            ๐Ÿ“Œ Asked to map the end-to-end flow of the open-source Datahub Spline project showing how Spark listeners capture upstream and downstream table metadata. I drew it out on a notepad.

                            ๐Ÿ“Œ Asked what conditions I would take care of when ingesting lots of Kafka event data and RDBMS data, and how to design an ETL pipeline to capture both streams in near real-time. I explained using Kafka Connect for RDBMS change data capture.

                            ๐Ÿ“Œ Does Delta Lake support upserts?

                            ๐Ÿ“Œ How to optimise AWS costs? I was given a scenario of unused S3 data, and many on-demand instances running. I mapped out S3 tiering, switching to spots vs on-demand when possible, and restricting resources through IAM policies.

                            He was very happy with my answers. At last, I asked what tools & technologies are you using in your team.

                            Round 5: Executive Director Interview

                            This interview was taken by the Executive Director of Jp Morgan Chase & Co. This round lasted for about 30 minutes.

                            I introduced myself and discussed projects like Databricks and my data engineering experience at my previous company.

                            Also explained my day-to-day responsibilities using Python, SQL, and Databricks.

                            Questions faced:

                            ๐Ÿ“Œ Explain the Kubernetes cluster used in your projects

                            ๐Ÿ“Œ What AWS services have you used? I explained the Compute Instances, S3 Bucket, Presto, EMR, Glue, Redshift, Load balancer, RDS

                            ๐Ÿ“Œ Explain the difference between Spark and Hadoop

                            ๐Ÿ“Œ Steps to migrate on-prem data and ETL jobs to AWS cloud

                            Scenario-based questions:

                            ๐Ÿ“Œ A business user asks you to add a data element to the reporting output that you know provides no value and will slow down performance. What would you do?

                            ๐Ÿ“Œ You're assigned 3 tasks in a sprint but suddenly get a high-priority new task. How do you manage the workload and update your manager?

                            The final question included if I could relocate for this hybrid role.

                            On the next day, I got positive feedback from HR. Fortunately, I got selected for the Senior Data Engineer (Data Engineer-3) position at Jp Morgan Chase & Co. But that time I decided to choose an offer from Walmart for my next career opportunity. ๐Ÿ˜Š

                            Connect with me for 1:1 personalised mentorship.