Solving JPMorgan's Data Engineering SQL Interview Question

Step-by-step solution to a popular JPMorgan data engineering interview question involving recursive SQL queries to find employees reporting to the company head hierarchically.

Mentor

Blog

Hey there! Nishchay here, a Data Engineer 3 at Walmart.

I've been working in the field of data engineering for quite some time now, and I've had the opportunity to work and interview with some of the biggest companies in the industry.

Today, I want to share with you an interesting question that was asked during one of my interviews at JPMorgan for a Data Engineer position.

The question was related to finding employees who report (directly or indirectly) to the head of the company using SQL.

Letโ€™s look at the question and itโ€™s solution.

Problem Statment

Write an SQL query to find the employee IDs of all employees who directly or indirectly report their work to the head of the company.

Given the following table structure:

Image
  • employee_id is the primary key for this table.
    • Each row of this table indicates that the employee with ID employee_id and name employee_name reports their work to their direct manager with manager_id.
      • The head of the company is the employee with employee_id = 1.

        Your query should return all employee IDs (including the head of the company) who directly or indirectly report to the head of the company.

        Solution

        To achieve the goal of finding all employees who directly or indirectly report to the head of the company (employee with employee_id = 1), we can use a recursive Common Table Expression (CTE) in SQL.

        This approach allows us to traverse the hierarchical manager-employee relationship in the table.

        Here's how you can write the SQL query:

        WITH RECURSIVE EmployeeChain AS (
          -- Base case: Select the head of the company
          SELECT employee_id, manager_id
          FROM Employees
          WHERE employee_id = 1
          UNION ALL
          -- Recursive step: Select employees who report (directly or indirectly) to those already in the chain
          SELECT e.employee_id, e.manager_id
          FROM Employees e
          INNER JOIN EmployeeChain ec ON e.manager_id = ec.employee_id
        )
        SELECT employee_id
        FROM EmployeeChain;
        

        Explanation:

        ๐Ÿ‘‰ Recursive CTE Initialization (WITH RECURSIVE): 

        This part of the query defines a recursive common table expression named EmployeeChain. The CTE is used to recursively find all employees under the head of the company.

        ๐Ÿ‘‰ Base Case: 

        The query starts with the head of the company (employee_id = 1) as the initial member of the chain. This ensures that the head of the company is included in the result set.

        ๐Ÿ‘‰ Recursive Step: 

        The query then recursively joins the Employees table with the EmployeeChain CTE. This step is crucial for traversing the hierarchy.

        It matches each employee's manager_id with the employee_id of employees already included in the EmployeeChain.

        This process repeats until no more employees can be added to the chain, effectively capturing all employees who report directly or indirectly to the head.

        ๐Ÿ‘‰ Final Selection: 

        The final SELECT statement retrieves the employee_id from the EmployeeChain CTE. This result set includes all employees who report, directly or indirectly, to the head of the company.

        This query leverages the hierarchical structure of the manager-employee relationship to find all employees connected to the head of the company, showing the power of recursive CTEs in handling hierarchical data in SQL databases.


        *******

        You can read my full JPMorgan Data Engineer Interview Experience here.

        If you're aiming for top firms and aren't afraid of putting in the hard work, I'd love to be your interview prep buddy!

        Having gone through this process myself, I can share tips and resources on everything from scalable data modeling to estimating query runtimes.

        We can walk through practice questions together, dissect optimal approaches, and get you interview-ready.

        Click here to book a trial session with me.