Top 20 SQL Interview Questions for MAANG Data Science/Analysts Roles

A compilation of 20 questions asked at MAANG (Meta, Apple, Amazon, Netflix and Google).

Mentor

Blog

Q1. What is the difference between DDL, DML and DCL commands in SQL?

A1. DDL stands for Data Definition Language and consists of commands that define the database schema. Some examples are CREATE, ALTER, DROP and TRUNCATE. These commands modify the structure of the database.

DML stands for Data Manipulation Language and consists of commands that manipulate the data itself. Some examples are INSERT, UPDATE and DELETE. These commands modify the data contained within the database.

DCL stands for Data Control Language and consists of commands that deal with rights, permissions and other controls of the database system. Some examples are GRANT and REVOKE. These commands control access to the database.

Q2. Write a query to get the second-highest salary from the employee table.

A2. To get the second highest salary, we first need to rank the salaries in descending order. Then we can limit the result set to the second row, which will contain the second-highest salary.

sqlCopy codeSELECT salary
FROM employee
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

sqlCopy codeSELECT salary FROM employee ORDER BY salary DESC LIMIT 1 OFFSET 1;

The ORDER BY sorts the salaries in descending order.

The LIMIT 1 OFFSET 1 limits the result to 1 row, offset by 1 row, thereby skipping the highest salary and returning the second highest salary.

Q3. How can you retrieve duplicate records from a table?

A3. To retrieve duplicate records from a table, we can use a self-join along with a HAVING clause. The idea is to join the table to itself based on matching column values to identify duplicates.

For example:

sqlCopy codeSELECT t1.*
FROM table t1
INNER JOIN table t2
ON t1.column1 = t2.column1 AND t1.column2 = t2.column2
GROUP BY t1.column1, t1.column2
HAVING COUNT(*) > 1;

This performs a self-join on the same table, matches the rows where the join columns are identical, groups the identical matches and filters for group counts greater than 1 to only return the duplicate rows.

Q4. Explain the primary key and foreign key.

A4. A primary key is a column or set of columns that uniquely identifies each row in the table. It enforces uniqueness and cannot contain NULL values. A table can only have one primary key.

A foreign key is a column or set of columns that establishes a link between data in two tables.

It refers to the primary key in another table, thereby creating a parent-child relationship between the two tables.

Foreign keys ensure data integrity and allow navigation between tables.

Q5. What are views? What are the benefits of using views?

A5. Views are virtual tables based on the result set of a SQL query. We can query views like regular tables.

Some key benefits of using views are:

  • Simplify complex queries: Complex joins and aggregations can be saved as a view to simplify business logic.
    • Restrict data access: We can limit access to sensitive data by creating a view to expose only certain columns and rows.
      • Maintain data consistency: Views always show updated data from the underlying tables. This ensures consistency across applications.
        • Improve performance: Operations like joins and aggregations are pre-computed in views, leading to faster query execution.

          Q6. What is a self-join? When would you use it?

          A6. A self-join is when you join a table to itself, essentially treating it as two separate tables. It is used when you need to create a relation between rows in the same table.

          Use cases include:

          • Querying hierarchical data like organisation charts or parent-child relationships where the parent and child records are stored in the same table.
            • Comparing rows within the same table, like finding employees with salaries higher than their managers.

              Q7. Explain different types of joins.

              A7. The main types of joins are:

              INNER JOIN: Returns rows where there is a match between both tables. It is the most common type of join.

              LEFT JOIN: Returns all rows from the left table and matches rows from the right table. Rows with no match in the right table contain NULLs.

              RIGHT JOIN: Returns all rows from the right table and matched rows from the left table. Unmatched rows contain NULLs.

              FULL OUTER JOIN: Returns all rows from both tables, including unmatched rows with NULLs from either side.

              Q8. What is a stored procedure and what are the benefits of using it?

              A8. A stored procedure is a precompiled collection of SQL statements stored on the database server. It can accept parameters, perform operations and return data.

              Key benefits include:

              • Improved performance - stored procedures are compiled once and reused rather than compiling each time.
                • Code re-use - modular code can be reused from multiple places without rewriting.
                  • Security - permissions can be applied directly to stored procedures.
                    • Reduced network traffic - rather than sending entire SQL queries, only stored procedure calls are sent.

                      Q9. What is the difference between clustered and non-clustered indexes?

                      A9. Clustered indexes physically sort rows in the table based on the index column. Only one clustered index is allowed per table.

                      Non-clustered indexes contain the index value and row locators pointing to the data. Multiple non-clustered indexes can be created and data is not sorted.

                      Clustered indexes are faster to read data in order but slower for write operations. Non-clustered indexes avoid sort overhead but are slower to read certain data.

                      Q10. What is normalization? Explain 1NF, 2NF, 3NF forms.

                      A10. Normalization optimises database structure to reduce data redundancy and anomalies.

                      1NF - atomic values, no repeating columns or groups

                      2NF - no partial dependencies on primary keys

                      3NF - no transitive dependencies on non-primary keys

                      Higher normal forms continue to drive atomicity and reduce inter-column dependencies.

                      Finding optimal normalization requires balancing performance vs redundancy.

                      Q11. How can you prevent SQL injection attacks?

                      A11. Some ways to prevent SQL injection attacks:

                      • Use parameterized queries - This separates query logic from user input by using parameters as placeholders for values.
                        • Validate user input - Whitelist allowable characters, sanitize special chars, and check input length.
                          • Limit database permissions - Only allow the required SELECT/INSERT/UPDATE permissions, don't use ADMIN.
                            • Escape user input - Escape special characters like quotes in user input before using it.
                              • Use stored procedures - These have predefined SQL logic that is resilient to injection.
                                • Disable error messages - Don't show database error info and stack traces to end users.

                                  Q12. What are aggregate functions in SQL? Give 3 examples.

                                  A12. Aggregate functions perform calculations on multiple values to return a single scalar value.

                                  Examples:

                                  • COUNT - Returns the number of rows
                                    • SUM - Calculates the total of values
                                      • AVG - Calculates average of values
                                        • MIN/MAX - Finds minimum/maximum value

                                          Q13. What is the difference between UNION and UNION ALL?

                                          A13. Both combine the result set of two or more SELECT statements.

                                          UNION removes duplicate rows between the individual result sets.

                                          UNION ALL does not remove duplicates, giving a complete combined result set including duplicates.

                                          Q14. What are window functions? Give an example.

                                          A14. Window functions perform calculations across rows of a query result set. They allow access to more than just the current row in a query.

                                          Example:

                                          sqlCopy codeSELECT
                                              Salary,
                                              AVG(Salary) OVER() AS AverageSalary
                                          FROM employees;

                                          This calculates the average salary over the entire window partition, for each row.

                                          Q15. What is a CTE (common table expression) and when would you use it?

                                          A15. A CTE or common table expression is a temporary result set defined in the SQL query. It can then be referenced in subsequent parts of the query.

                                          Use cases:

                                          • Simplify complex logic/subqueries
                                            • Recursive queries
                                              • Handle data transformation in steps
                                                • Avoid duplicate expressions in UNION queries

                                                  Q16. Explain ACID properties and how they are ensured in database transactions.

                                                  A16. ACID refers to key properties of database transactions:

                                                  Atomicity - Ensures the entire transaction succeeds or fails as a whole. If any part fails, the entire transaction is rolled back.

                                                  Consistency - The database is in a valid state when the transaction starts and ends. Referential integrity constraints help maintain consistency.

                                                  Isolation - Concurrent executions of transactions produce results identical to executing them serially. Isolation levels like READ COMMITTED prevent dirty reads.

                                                  Durability - Once committed, a transaction persists even in the event of power loss, crashes or errors. Write-ahead logging helps achieve durability.

                                                  ACID properties are ensured through:

                                                  • Locking mechanisms during transactions
                                                    • Transaction logs to support rolling back
                                                      • Disk storage and write-ahead logging
                                                        • Isolation levels for concurrency control

                                                          This helps maintain data integrity and consistency in databases.

                                                          Q17. How can you find the Nth highest salary in a table without using LIMIT?

                                                          A17. Use ranking window functions like RANK() or DENSE_RANK():

                                                          sqlCopy codeSELECT Salary
                                                          FROM (
                                                            SELECT Salary, RANK() OVER (ORDER BY Salary DESC) rnk
                                                            FROM employees
                                                          ) t
                                                          WHERE rnk = N;
                                                          
                                                          

                                                          This assigns a rank number to each row ordered by salary. Filtering where rank = N gives the Nth salary.

                                                          Q18. What is the difference between HAVING and WHERE?

                                                          A18. WHERE filters rows before aggregation. HAVING filters rows after aggregation.

                                                          WHERE operates on individual rows before GROUP BY. HAVING operates on grouped rows after GROUP BY.

                                                          Q19. What are the advantages of prepared statements?

                                                          A19. The advantages of prepared statements are as follows:

                                                          • Prevention against SQL injection attacks by separating query logic from parameters.
                                                            • Improved performance as the query plan is cached on the first execution and reused.
                                                              • Cleaner code as the query logic and parameters are defined separately.
                                                                • Data type validation and proper escaping are handled by the driver.

                                                                  Q20. How would you optimise a slow-running query?

                                                                  A20. Techniques to optimise slow queries:

                                                                  • Check the execution plan to identify bottlenecks
                                                                    • Create indexes on columns used in JOIN, WHERE, ORDER BY
                                                                      • Tune poorly performing queries and stored procedures
                                                                        • Use partitioning to split large datasets
                                                                          • Optimise data types, expressions and predicates
                                                                            • Analyse stale statistics so the optimiser has updated information

                                                                              Use these questions to prepare for your upcoming SQL interviews. If you have any questions free free to reach out.

                                                                              All the best!