In a previous post, I outlined the basic skills any beginner data analyst should know. But where do I go to from there? Well, as mentioned, SQL is one of the basics and is fundamental to your ability to extract and transform data for further analysis. So, in this post, I’ll build on that and show you the basic SQL skills you should know as a beginner data analyst.

So, here we’ll look at concepts like WHERE, GROUP BY, SUM, COUNT, HAVING, and more. By knowing and understanding these concepts, you’ll have a strong foundation in about 75% of everything you’ll use regularly as a data analyst, even later on in your career.

Filters

If you’re only interested in a part of the data in a data set, you’re able to extract that data using filters. With them, you have a variety of options that, based on the filtering criteria you use, will return a filtered data set.

 

WHERE

The WHERE statement lets you extract only those records from your data that meet a specified condition. For example, if you have customer data, you’ll be able to extract the data of customers from a specific country, for instance, Australia, by using the WHERE statement.

 SELECT * FROM Customers WHERE country = “Australia”;

 

AND, OR, and NOT

With AND, OR, and NOT, you can filter the results obtained by using the WHERE statement even further. As such, these operators allow you to add further conditions to your query. For example, let’s say you want to filter your customer data to obtain the data of customers in Spain who speak English.

SELECT * FROM Customers WHERE country = “Spain” AND language = “English”;

 

ORDER BY

Typically, when you get the returned data from filtering, it will be unsorted. The ORDER BY statement allows you to sort this data based on your specific needs or preferences. Here, you’ll, for instance, be able to sort the data alphabetically with the customer names in descending order.

SELECT * FROM Customers WHERE country = “Spain” AND language = “English” ORDER BY CustomerName DESC;

 You can also sort the data in ascending order by using ASC instead of DESC.

 

BETWEEN

Unlike the WHERE clause that filters data based on a specified condition, the BETWEEN statement allows you to extract data within a specific range. You can use it when, let’s say, you want to extract the data of all customers aged between 25 and 40. It’s important to remember that, when using BETWEEN, both the upper and lower limits of your range will be included in the results.

SELECT * from Customers WHERE CustomerAge BETWEEN 25 AND 40;

 

LIKE

 Occasionally, it might be necessary to filter data based on a specific pattern. In this instance, you’ll use the LIKE statement to filter your data by, for instance, the first letter of the customers’ names.

SELECT * from Customers WHERE CustomerName LIKE ‘C%’;

Here, the data will be filtered to include only customers whose names begin with a C. If you only want to see customer data for those customers whose names end with a C, you’ll put the ‘%’ in front of the letter.

 

GROUP BY

The GROUP BY statement allows you to group data based on the criteria you provide. For instance, let’s say you want to group all the customer data into groups based on the country where the customers are from.

SELECT COUNT(CustomerID), Country

FROM Customers

GROUP BY Country;

This statement will list the number of customers in the database in each country and group the numbers by country.

 

HAVING

Because the WHERE statement can’t work with aggregate functions, you’ll need to use the HAVING statement. You’ll use it when you, for instance, want to group the number of customers by country like the example above, but you only want to include countries that have more than 5 customers.

SELECT COUNT(CustomerID), Country

FROM Customers

GROUP BY Country

HAVING COUNT(CustomerID) > 5;

 

Joins

 Joins allow you to combine two or more rows of tables together based on a related column. Let’s say you want to work with the data of an e-commerce business. You’ll then, typically, have tables for Products, Orders, and Customers.

 To then see which customers ordered which products, you’ll need to find the customer ID in the Orders table and then go to the Customer table to see the products purchased. You’ll then need to look at the Product table to see which products the customer bought. Obviously, this is a cumbersome process that joins allow you to simplify.

 

INNER JOIN

 

INNER JOIN allows you to merge tables with a shared column. For instance, using the example above, you could find all orders that have a customer name attached to them. Likewise, any others without customer names will not be shown.

 

SELECT Orders.OrderID, Customers.CustomerName

FROM Orders

INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

 

LEFT JOIN

 

The LEFT JOIN statement returns all rows from the left table with matching records from the right table. For instance, you’ll be able to return all the rows from the Customers table with matching records in the Orders table.

 

SELECT Customers.CustomerName, Orders.OrderID

FROM Customers

LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID

ORDER BY Customers.CustomerName;

 

RIGHT JOIN

 

Conversely, RIGHT JOIN allows you to return all the rows from the right table with matching records in the left table. So, for example, you’ll be able to return all rows from the Orders table with matching records in an Employees table, or, in other words, any order employees might have placed.

 

SELECT Orders.OrderID, Employees.LastName, Employees.FirstName

FROM Orders

RIGHT JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID

ORDER BY Orders.OrderID;

 

OUTER JOIN

 

The FULL OUTER JOIN or FULL JOIN statement lets you return rows where there is a matching record in both the left and the right tables. For example, using this statement, you can return all customers and all orders.

 

SELECT Customers.CustomerName, Orders.OrderID

FROM Customers

FULL OUTER JOIN Orders ON Customers.CustomerID=Orders.CustomerID

ORDER BY Customers.CustomerName;

 

SQL Aggregate Functions

 

A SQL function takes an input and returns an output. Here, aggregate functions take rows of a table as input and then return an aggregation of information.

 

MIN

 

The MIN function returns the smallest value in a column. For example, you can use it to return the lowest priced product in a column of prices.

 

SELECT MIN(Price) AS LowestPrice

FROM Products;

 

MAX

 

In contrast, the MAX function returns the highest value in a column. So, you can use it to return the most expensive product.

 

SELECT MAX(Price) AS HighestPrice

FROM Products;

 

AVG

 

The AVG function returns the average value of all the values in a column. For, instance, you can use it to return the average price of all products in a Products table.

 

SELECT AVG(Price)

FROM Products;

 

COUNT

 

The COUNT function returns the number of values in a column. It would, for example, return the total number of products in a Products table.

 

SELECT COUNT(ProductID)

FROM Products;

 

SUM

 

The SUM function returns the total sum of all the values in a numeric column. You can, for instance, use it to return the sum of all the orders in an “OrderDetails” column.

 

SELECT SUM(Quantity)

FROM OrderDetails;

 

In Closing

 

There you go, now you have an idea of the basic SQL skills you should know as a beginner data analyst. Hopefully, these will set you up with the best possible start for your career and from where you can build your skills further.

 

To learn more, check out our blog page where you’ll find tons of resources and free advice or subscribe to our newsletter to get these straight to your inbox.