Most of the web applications I’ve worked on use an SQL database as their storage mechanism. Even when there’s a layer of code outside the database managing the data (which is the usual case these days), I often find it helpful to use database commands to see what’s actually been stored. Maybe I’m trying to craft a report or figure out whether data is being saved when a user fills out a form, or maybe I want to migrate information from one system to another.

An actual request I received at a job (paraphrased):

“The customer has an existing web site with user accounts, comments, and uploaded documents. They would like to transfer as much as possible into a web site running our software. They don’t have anyone who knows how it’s programmed, but they can give us the root password.”

I’m Not From Around Here, So Where Do I Start?

Database tutorials tend to start out with SELECT. SELECT is, in fact, what I use most, and it’s part of the ANSI SQL standard. But how do you know what’s available to SELECT FROM? Maybe you’ve seen reference to a specific table name in the code or documentation for your project, but maybe not!

(My examples will use MySQL-specific commands. Other SQL database programs have equivalents for most of these. Check the documentation. Table 1 of this Linux Journal article shows equivalent commands for three popular clients.

Show Me the Tables!

mysql> SHOW TABLES;

+-----------------------+
| Tables_in_amykfarrell |
+-----------------------+
| ak_commentmeta        |
| ak_comments           |
| ak_links              |
| ak_openid_identities  |
| ak_options            |
| ak_postmeta           |
| ak_posts              |
| ak_term_relationships |
| ak_term_taxonomy      |
| ak_terms              |
| ak_usermeta           |
| ak_users              |
+-----------------------+

(PostgreSQL equivalent: \dt)

MySQL has just shown me the names of the tables in the database. This happens to be a simple WordPress database schema (I already know that). If I’m interested in data about users, I might start inspecting ak_users next. Knowing that this is a WordPress database, I think ak_posts looks interesting, too!

What if Those are Not the Tables I’m Looking for?

Usually I can start with SHOW TABLES because I’m already looking at the right database schema, due to how I connected to the database. Notice how I skipped right over making the connection? How you do that varies; you might identify the schema you want as part of the connection, but you might not, and some applications span multiple schemas. (A schema is a collection of database objects, like tables. The term also refers to the structure of database objects. In practical terms, a schema is like a container for your database stuff.)

There’s a hint at the top of the mysql output above—the heading “Tables_in_amykfarrell” tells me I’m looking at a schema called “amykfarrell”. To see all the schemas (that I have permission to view), I can use SHOW DATABASES (MySQL happens to refer to SQL schemas as “databases”; this isn’t standard usage).

mysql> SHOW DATABASES;

+--------------------+
| Database           |
+--------------------+
| information_schema |
| amykfarrell        |
| drupal7            |
| mysql              |
| performance_schema |
| test               |
+--------------------+

(PostgreSQL equivalent: \l)

Apparently Drupal 7 has been installed on this server as well—or somebody has a funny way of naming things!

I could go on to SHOW TABLES IN drupal7 to see if what I want is in there.

Okay, Can We Look at Some Data Now?

Well, you could jump in and try SELECT * FROM ak_posts. But let’s assume we have a lot of data and we want to be a bit more, well, selective. Here’s how we find about a bit more about this table:

mysql> DESCRIBE ak_posts;

+-----------------------+---------------------+------+-----+---------------------+----------------+
| Field                 | Type                | Null | Key | Default             | Extra          |
+-----------------------+---------------------+------+-----+---------------------+----------------+
| ID                    | bigint(20) unsigned | NO   | PRI | NULL                | auto_increment |
| post_author           | bigint(20) unsigned | NO   | MUL | 0                   |                |
| post_date             | datetime            | NO   |     | 0000-00-00 00:00:00 |                |
| post_date_gmt         | datetime            | NO   |     | 0000-00-00 00:00:00 |                |
| post_content          | longtext            | NO   |     | NULL                |                |
| post_title            | text                | NO   |     | NULL                |                |
| post_excerpt          | text                | NO   |     | NULL                |                |
| post_status           | varchar(20)         | NO   |     | publish             |                |
| comment_status        | varchar(20)         | NO   |     | open                |                |
| ping_status           | varchar(20)         | NO   |     | open                |                |
| post_password         | varchar(20)         | NO   |     |                     |                |
| post_name             | varchar(200)        | NO   | MUL |                     |                |
| to_ping               | text                | NO   |     | NULL                |                |
| pinged                | text                | NO   |     | NULL                |                |
| post_modified         | datetime            | NO   |     | 0000-00-00 00:00:00 |                |
| post_modified_gmt     | datetime            | NO   |     | 0000-00-00 00:00:00 |                |
| post_content_filtered | longtext            | NO   |     | NULL                |                |
| post_parent           | bigint(20) unsigned | NO   | MUL | 0                   |                |
| guid                  | varchar(255)        | NO   |     |                     |                |
| menu_order            | int(11)             | NO   |     | 0                   |                |
| post_type             | varchar(20)         | NO   | MUL | post                |                |
| post_mime_type        | varchar(100)        | NO   |     |                     |                |
| comment_count         | bigint(20)          | NO   |     | 0                   |                |
+-----------------------+---------------------+------+-----+---------------------+----------------+

(PostgreSQL equivalent: \d+ ak_posts)

This tells me a bit about each field (or column) defined in the table. I can see where I might look for a user by name or email address. Maybe I should get an idea of how much data there is, too:

mysql> SELECT COUNT(*) FROM ak_posts;

+----------+
| COUNT(*) |
+----------+
| 37       |
+----------+

(PostgreSQL equivalent: the same! This command is ANSI SQL, meaning it’s standard across SQL databases.)

In MySQL, I can get a bit more information about this table by showing the CREATE command for it:

mysql> SHOW CREATE TABLE ak_posts;

+----------+-------------------------------------------------------
| Table    | Create Table
+----------+-------------------------------------------------------
| ak_posts | CREATE TABLE `ak_posts` (
 `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
 `post_author` bigint(20) unsigned NOT NULL DEFAULT '0',
 `post_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
 `post_date_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
 `post_content` longtext NOT NULL,
 `post_title` text NOT NULL,
 `post_excerpt` text NOT NULL,
 `post_status` varchar(20) NOT NULL DEFAULT 'publish',
 `comment_status` varchar(20) NOT NULL DEFAULT 'open',
 `ping_status` varchar(20) NOT NULL DEFAULT 'open',
 `post_password` varchar(20) NOT NULL DEFAULT '',
 `post_name` varchar(200) NOT NULL DEFAULT '',
 `to_ping` text NOT NULL,
 `pinged` text NOT NULL,
 `post_modified` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
 `post_modified_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
 `post_content_filtered` longtext NOT NULL,
 `post_parent` bigint(20) unsigned NOT NULL DEFAULT '0',
 `guid` varchar(255) NOT NULL DEFAULT '',
 `menu_order` int(11) NOT NULL DEFAULT '0',
 `post_type` varchar(20) NOT NULL DEFAULT 'post',
 `post_mime_type` varchar(100) NOT NULL DEFAULT '',
 `comment_count` bigint(20) NOT NULL DEFAULT '0',
 PRIMARY KEY (`ID`),
 KEY `post_name` (`post_name`),
 KEY `type_status_date` (`post_type`,`post_status`,`post_date`,`ID`),
 KEY `post_parent` (`post_parent`),
 KEY `post_author` (`post_author`)
) ENGINE=MyISAM AUTO_INCREMENT=91 DEFAULT CHARSET=utf8
+----------+-------------------------------------------------------

(PostgreSQL equivalent: none that I know of, but \d+ provides the details that MySQL omits from DESCRIBE—just not in “Create Table” format.)

The additional information is at the end:

  • ID is the primary key (as I would have guessed from its name and the fact that it’s an auto-incrementing integer; this is a common pattern).
  • Several columns are indexed (KEY) for faster search. One KEY indexes a group of four columns—so that combination is probably important!

FOREIGN KEY (relating a field in this table to a field in another) and other constraints might appear here.

If you’re trying to construct a statement to create a new table, this command can be a nice way to get examples from your existing database.

Getting Just a Sample

If I’m really unfamiliar with the data in a table, I like to just look at a few rows. Sometimes I’ll look up the “most recent” few. Here, I’ll just look at those columns of interest I saw in the type_status_date index:

mysql> SELECT ID, post_type, post_status, post_date FROM ak_posts
ORDER BY ID DESC LIMIT 3;

+----+-----------+-------------+---------------------+
| ID | post_type | post_status | post_date           |
+----+-----------+-------------+---------------------+
| 66 | revision  | inherit     | 2013-07-28 22:09:55 |
| 65 | page      | publish     | 2013-07-28 22:10:10 |
| 64 | revision  | inherit     | 2013-07-28 20:13:48 |
+----+-----------+-------------+---------------------+

I’m taking advantage of the auto-incremented ID column to let me find the last few entries in the table. Immediately we can guess that this table has entries for revisions as well as published posts. “Inherit” might mean something interesting; looking back at the full list of fields I see that this table also has a “post_parent” field, which probably identifies a relationship between posts or between posts and … something else. And yes, we can see that there haven’t been any new posts since 2013, alas.

Where Next?

After running commands like this on a few tables, I’ll have enough clues to start constructing SELECT queries to look at actual data. I keep an eye out for things like:

  • Columns in one table that appear to refer to another table, either by naming convention (for example, a column named “othertable_id”), or by a FOREIGN KEY constraint.
  • Columns of datetime type or int(11) with a name suggesting time. (It’s a common convention to save the time when an entry was created as a Unix timestamp—a 32-bit integer.) These can be really helpful if I’m interested in activity within a certain timeframe, or if I just want to see whether a table has had recent updates.
  • Names that mirror class names in the application (if I have code to look at) or in the user interface. These help me get my bearings, but beware—terminology tends to change most frequently in the UI, and most slowly in the database schema!

I hope this gives you some idea of how to get the lay of the land in an unfamiliar database schema. Happy data-spelunking!