An Introductory SQL Tutorial: How to Write Simple Queries
How to Query a SQL Database: Make sure that you have a database management application (ex. MySQL Workbench, Sequel Pro). If not, download a database management application and work with your company to connect your database. Understand your database...
Ever heard of SQL? You may have heard about it in the context of data analysis, but never thought it would apply to you as a marketer. Or, you may have thought, “That's for the advanced data users. I could never do that.” Well, you couldn't be more wrong! The most successful marketers are data-driven, and one of the most important parts of being data-driven is collecting data from databases quickly. SQL is the most popular tool out there for doing just that. If your company already stores data in a database, you may need to learn SQL to access the data. But not to worry — you're in the right place to get started. Let's jump right in. SQL (often pronounced like “sequel”) stands for Structured Query Language, and it's used when companies have a ton of data that they want to manipulate. The beauty of SQL is that anyone working at a company that stores data in a relational database can use it. (And chances are, yours does.) For example, if you work for a software company and want to pull usage data on your customers, you can do that with SQL. If you’re helping develop a website for an ecommerce company that has data about customer purchases, you can use SQL to find out which customers are purchasing which products. Of course, these are just a few of many possible applications. Think about it this way: Have you ever opened a very large data set in Excel, only for your computer to freeze or even shut down? SQL allows you to access only certain parts of your data at a time so you don't have to download all the data into a CSV, manipulate it, and possibly overload Excel. In other words, SQL takes care of the data analysis that you may be used to doing in Excel. Before we begin, make sure you have a database management application that will allow you to pull data from your database. Some options include MySQL or Sequel Pro. Start by downloading one of these options, then talk to your company’s IT department about how to connect to your database. The option you choose will depend on your product's back end, so check with your product team to make sure you select the correct one. Next, it's important to become accustomed to your database and its hierarchy. If you have multiple databases of data, you'll need to hone in on the location of the data you want to work with. For example, let's pretend we're working with multiple databases about people in the United States. Enter the query “SHOW DATABASES;”. The results may show that you have a couple of databases for different locations, including one for New England. Within your database, you'll have different tables containing the data you want to work with. Using the same example above, let's say we want to find out which information is contained in one of the databases. If we use the query “SHOW TABLES in NewEngland;”, we'll find that we have tables for each state in New England: people_connecticut, people_maine, people_massachusetts, people_newhampshire, people_rhodeisland, and people_vermont. Finally, you need to find out which fields are in the tables. Fields are the specific pieces of data that you can pull from your database. For example, if you want to pull someone's address, the field name may not just be “address” — it may be separated into address_city, address_state, address_zip. In order to figure this out, use the query “Describe people_massachusetts;”. This provides a list of all of the data that you can pull using SQL. Let's do a quick review of the hierarchy using our New England example: Now, let’s write some simple SQL queries to pull data from our NewEngland database. To learn how to write a SQL query, let's use the following example: Who are the people who have red hair in Massachusetts and were born in 2003 organized in alphabetical order? SELECT chooses the fields that you want displayed in your chart. This is the specific piece of information that you want to pull from your database. In the example above, we want to find the people who fit the rest of the criteria. Here is our SQL query: SELECT first_name, last_name ; FROM pinpoints the table that you want to pull the data from. In the earlier section, we learned that there were six tables for each of the six states in New England: people_connecticut, people_maine, people_massachusetts, people_newhampshire, people_rhodeisland, and people_vermont. Because we're looking for people in Massachusetts specifically, we'll pull data from that specific table. Here is our SQL query: SELECT first_name, last_name FROM people_massachusetts ; WHERE allows you to filter a query to be more specific. In our example, we want to filter our query to include only people with red hair who were born in 2003. Let's start with the red hair filter. Here is our SQL query: SELECT first_name, last_name FROM people_massachusetts WHERE hair_color = 'red' ; hair_color could have been part of your initial SELECT statement if you'd wanted to look at all of the people in Massachusetts along with their hair color. But if you want to filter to see only people with red hair, you can do so with a WHERE statement. Besides equals (=), BETWEEN is another operator you can use for conditional queries. A BETWEEN statement is true for values that fall between the specified minimum and maximum values. In our case, we can use BETWEEN to pull records from a specific year, like 2003. Here’s the query: SELECT first_name, last_name FROM people_massachusetts WHERE birth_date BETWEEN '2003-01-01' AND '2003-12-31' ; AND allows you to add additional criteria to your WHERE statement. Remember, we want to filter by people who had red hair in addition to people who were born in 2003. Since our WHERE statement is taken up by the red hair criteria, how can we filter by a specific year of birth as well? That's where the AND statement comes in. In this case, the AND statement is a date property — but it doesn't necessarily have to be. (Note: Check the format of your dates with your product team to make sure they are in the correct format.) Here is our SQL query: SELECT first_name, last_name FROM people_massachusetts WHERE hair_color = 'red' AND birth_date BETWEEN '2003-01-01' AND '2003-12-31' ; OR can also be used with a WHERE statement. With AND, both conditions must be true to appear in results (e.g., hair color must be red and must be born in 2003). With OR, either condition must be true to appear in results (e.g., hair color must be red or must be born in 2003). Here’s what an OR statement looks like in action: SELECT first_name, last_name FROM people_massachusetts WHERE hair_color = ‘red’ OR birth_date BETWEEN '2003-01-01' AND '2003-12-31' ; NOT is used in a WHERE statement to display values in which the specified condition is untrue. If we wanted to pull up all Massachusetts residents without red hair, we can use the following query: SELECT first_name, last_name FROM people_massachusetts WHERE NOT hair_color = ‘red’ ; Calculations and organization also can be done within a query. That's where the ORDER BY and GROUP BY functions come in. First, we'll look at our SQL queries with the ORDER BY and then GROUP BY functions. Then, we'll take a brief look at the difference between the two. An ORDER BY clause allows you to sort by any of the fields that you have specified in the SELECT statement. In this case, let's order by last name. Here is our SQL query: SELECT first_name, last_name FROM people_massachusetts WHERE hair_color = ‘red’ AND birth_date BETWEEN '2003-01-01' AND '2003-12-31' ORDER BY last_name ; GROUP BY is similar to ORDER BY, but aggregates data that has similarities. For example, if you have any duplicates in your data, you can use GROUP BY to count the number of duplicates in your fields. Here is your SQL query: SELECT first_name, last_name FROM people_massachusetts WHERE hair_color = ‘red’ AND birth_date BETWEEN '2003-01-01' AND '2003-12-31' GROUP BY last_name ; To show the difference between an ORDER BY statement and a GROUP BY statement, let's step outside our Massachusetts example briefly to look at a very simple dataset. Below is a list of four employees' ID numbers and names. If we were to use an ORDER BY statement on this list, the names of the employees would get sorted in alphabetical order. The result would look like this: If we were to use a GROUP BY statement instead, the employees would be counted based on the number of times they appeared in the initial table. Note that Peter appeared twice in the initial table, so the result would look like this: With me so far? Okay, let's return to the SQL query we've been creating about red-haired people in Massachusetts who were born in 2003. Depending on the amount of data you have in your database, it may take a long time to run your queries. This can be frustrating, especially if you’ve made an error in your query and now need to wait before continuing. If you want to test a query, the LIMIT function lets you limit the number of results you get. For example, if we suspect there are thousands of people who have red hair in Massachusetts, we may want to test out our query using LIMIT before we run it in full to make sure we're getting the information we want. Let's say, for instance, we only want to see the first 100 people in our result. Here is our SQL query: SELECT first_name, last_name FROM people_massachusetts WHERE hair_color = ‘red’ AND birth_date BETWEEN '2003-01-01' AND '2003-12-31' ORDER BY last_name LIMIT 100 ; In addition to retrieving information from a relational database, SQL can also be used to modify the contents of a database. Of course, you’ll need permissions to make changes to your company’s data. But, in case you’re ever in charge of managing the contents of a database, we’ll share some queries you should know. First is the INSERT INTO statement, which is for putting new values into your database. If we want to add a new person to the Massachusetts table, we can do so by first providing the name of the table we want to modify, and the fields within the table we want to add to. Next, we write VALUE with each respective value we want to add. Here’s what that query could look like: INSERT INTO people_massachusetts (address_city, address_state, address_zip, hair_color, age, first_name, last_name) VALUES (Cambridge, Massachusetts, 02139, blonde, 32, Jane, Doe) ; Alternatively, if you are adding a value to every field in the table, you don’t need to specify fields. The values will be added to columns in the order that they are listed in the query. INSERT INTO people_massachusetts VALUES (Cambridge, Massachusetts, 02139, blonde, 32, Jane, Doe) ; If you only want to add values to specific fields, you must specify these fields. Say we only want to insert a record with first_name, last_name, and address_state — we can use the following query: INSERT INTO people_massachusetts (first_name, last_name, address_state) VALUES (Jane, Doe, Massachusetts) ; If you want to replace existing values in your database with different values, you can use UPDATE. What if, for example, someone is recorded in the database as having red hair when they actually have brown hair? We can update this record with UPDATE and WHERE statements: UPDATE people_massachusetts SET hair_color = ‘brown’ WHERE first_name = ‘Jane’ AND last_name = ‘Doe’ ; Or, say there’s a problem in your table where some values for “address_state” appear as “Massachusetts” and others appear as “MA”. To change all instances of “MA” to “Massachusetts” we can use a simple query and update multiple records at once: UPDATE people_massachusetts SET address_state = ‘Massachusetts’ WHERE address_state = MA ; Be careful when using UPDATE. If you don’t specify which records to change with a WHERE statement, you’ll change all values in the table. DELETE removes records from your table. Like with UPDATE, be sure to include a WHERE statement, so you don’t accidentally delete your entire table. Or, if we happened to find several records in our people_massachusetts table who actually lived in Maine, we can delete these entries quickly by targeting the address_state field, like so: DELETE FROM people_massachusetts WHERE address_state = ‘maine’ ; Now that you’ve learned how to create a simple SQL query, let's discuss some other tricks that you can use to take your queries up a notch, starting with the asterisk. When you add an asterisk character to your SQL query, it tells the query that you want to include all the columns of data in your results. In the Massachusetts example we've been using, we've only had two column names: first_name and last_name. But let's say we had 15 columns of data that we want to see in our results — it would be a pain to type all 15 column names in the SELECT statement. Instead, if you replace the names of those columns with an asterisk, the query will know to pull all of the columns into the results. Here's what the SQL query would look like: SELECT * FROM people_massachusetts WHERE hair_color = 'red' AND birth_date BETWEEN '2003-01-01' AND '2003-12-31' ORDER BY last_name LIMIT 100 ; The percent symbol is a wildcard character, meaning it can represent one or more characters in a database value. Wildcard characters are helpful for locating records that share common characters. They are typically used with the LIKE operator to find a pattern in the data. For instance, if we wanted to get the names of every person in our table whose zip code begins with “02”, we can write this query: SELECT first_name, last_name WHERE address_zip LIKE '02%' ; Here, “%” stands in for any group of digits that follow “02”, so this query turns up any record with a value for address_zip that begins with “02”. Once I started using SQL regularly, I found that one of my go-to queries involved trying to find which people took an action or fulfilled a certain set of criteria within the last 30 days. Let's pretend today is December 1, 2021. You could create these parameters by making the birth_date span between November 1, 2021 and November 30, 2021. That SQL query would look like this: SELECT first_name, last_name FROM people_massachusetts WHERE hair_color = 'red' AND birth_date BETWEEN '2021-11-01' AND '2021-11-30' ORDER BY last_name LIMIT 100 ; But, that would require thinking about which dates cover the last 30 days, and you'd have to update this query constantly. Instead, to make the dates automatically span the last 30 days no matter which day it is, you can type this under AND: birth_date >= (DATE_SUB(CURDATE(),INTERVAL 30)) (Note: You'll want to double-check this syntax with your product team because it may differ based on the software you use to pull your SQL queries.) Your full SQL query would therefore look like this: SELECT first_name, last_name FROM people_massachusetts WHERE hair_color = 'red' AND birth_date >= (DATE_SUB(CURDATE(),INTERVAL 30)) ORDER BY last_name LIMIT 100 ; In some cases, you may want to count the number of times that a criterion of a field appears. For example, let's say you want to count the number of times the different hair colors appear for the people you are tallying up from Massachusetts. In this case, COUNT will come in handy so you don't have to manually add up the number of people who have different hair colors or export that information to Excel. Here's what that SQL query would look like: SELECT hair_color, COUNT(hair_color) FROM people_massachusetts AND birth_date BETWEEN '2003-01-01' AND '2003-12-31' GROUP BY hair_color ; AVG calculates the average of an attribute in the results of your query, excluding NULL values (empty). In our example, we could use AVG to calculate the average age of Massachusetts residents in our query. Here’s what our SQL query could look like: SELECT AVG(age) FROM people_massachusetts ; SUM is another simple calculation you can do in SQL. It calculates the total value of all attributes from your query. So, if we wanted to add up all the ages of Massachusetts residents, we can use this query: SELECT SUM(age) FROM people_massachusetts ; MIN and MAX are two SQL functions that give you the smallest and largest values of a given field. We can use it to identify the oldest and youngest members of our Massachusetts table: This query will give us the record of the oldest: SELECT MIN(age) FROM people_massachusetts ; And this query gives us the oldest: SELECT MAX(age) FROM people_massachusetts ; There may be a time when you need to access information from two different tables in one SQL query. In SQL, you can use a JOIN clause to do this. (For those familiar with Excel formulas, this is similar to using the VLOOKUP formula when you need to combine information from two different sheets in Excel.) Let's say we have one table that has data of all Massachusetts residents' user IDs and their birthdates. In addition, we have an entirely separate table containing all Massachusetts residents' user IDs and their hair color. If we want to figure out the hair color of Massachusetts residents born in the year 2003, we'd need to access information from both tables and combine them. This works because both tables share a matching column: user IDs. Because we're calling out fields from two different tables, our SELECT statement is also going to change slightly. Instead of just listing out the fields we want to include in our results, we'll need to specify which table they're coming from. (Note: The asterisk function may come in handy here so your query includes both tables in your results.) To specify a field from a specific table, all we have to do is combine the name of the table with the name of the field. For example, our SELECT statement would say “table.field” — with the period separating the table name and the field name. We're also assuming a few things in this case: Your SQL query would therefore look like: SELECT birthdate_massachusetts.first_name, birthdate_massachusetts.last_name FROM birthdate_massachusetts JOIN haircolor_massachusetts USING (user_id) WHERE hair_color = 'red' AND birth_date BETWEEN '2003-01-01' AND '2003-12-31' ORDER BY last_name ; This query would join the two tables using the field “user_id” which appears in both the birthdate_massachusetts table and the haircolor_massachusetts table. You’re then able to see a table of people born in 2003 who have red hair. Use a CASE statement when you want to return different results to your query based on which condition is met. Conditions are evaluated in order. Once a condition is met, the corresponding result is returned and all following conditions are skipped over. You can include an ELSE condition at the end in case no conditions are met. Without an ELSE, the query will return NULL if no conditions are met. Here’s an example of using CASE to return a string based on the query: SELECT first_name, last_name FROM people_massachusetts CASE WHEN hair_color = ‘brown’ THEN ‘This person has brown hair.’ WHEN hair_color = ‘blonde’ THEN ‘This person has blonde hair.’ WHEN hair_color = ‘red’ THEN ‘This person has red hair.’ ELSE ‘Hair color not known.’ END ; Congratulations. you're ready to run your own SQL queries! While there's a lot more you can do with SQL, I hope you found this overview of the basics helpful so you can get your hands dirty. With a strong foundation of the basics, you'll be able to navigate SQL better and work toward some of the more complex examples. Editor's note: This post was originally published in March 2015 and has been updated for comprehensiveness.How to Query a SQL Database:
Make sure that you have a database management application (ex. MySQL Workbench, Sequel Pro).
If not, download a database management application and work with your company to connect your database.
Understand your database and its hierarhcy.
Find out which fields are in your tables.
Begin writing a SQL query to pull your desired data.
Why Use SQL?
How to Write Simple SQL Queries
Understand the hierarchy of your database
Basic SQL Queries
SELECT
FROM
WHERE
BETWEEN
AND
OR
NOT
ORDER BY
GROUP BY
ORDER BY VS. GROUP BY
LIMIT
INSERT INTO
UPDATE
DELETE
Bonus: Advanced SQL Tips
* (asterisk)
% (percent symbol)
LAST 30 DAYS
COUNT
AVG
SUM
MIN and MAX
JOIN
CASE
Basic SQL Queries Marketers Should Know
Originally published Mar 21, 2022 7:00:00 AM, updated March 21 2022