Subjectively

dd if=/dev/random | kirk > blog

Subjectively header image 2

How big would my table be in PostgreSQL

November 7th, 2013 · No Comments · PostgreSQL

Sometimes you want to make some guesses about the size of data on disk long before you have the data in hand. Here’s a simple way to define a structure using a VALUES list, and then do some quick estimates of how big the resulting table would be when extrapolated to the expected number of rows. Be aware that this technique doesn’t estimate the size of indexes (possibly larger than the data itself), free space map, visibility map, or any other associative data. That makes the projection a little rough, but it should get you into the ballpark.

Of course, the simplest way (and probably the more accurate) to find out how much space per row is consumed on disk would be to create the table, add a few rows, and use the pg_relation_size() function to tell you how big it is.  However, over the years I’ve had a few oddball situations where creating a table was not feasible due to various business operational reasons.

If all you need is a quick estimate, you can create a VALUES clause that mimics the table definition. Here’s a quick example:

1
2
3
4
5
6
7
8
SELECT * 
FROM (
VALUES
 (1::INTEGER ,'RA'::text ,now()::DATE, 1::INTEGER)
,(2, 'SA', now()::DATE -1, 2)
,(3, 'C', now()::DATE -2, 4)
)  AS mytable 
	(application_id, application_status, status_date, fk_status_id)

Then write a wrapper query around it like so:

1
2
3
4
5
6
7
8
9
10
11
12
WITH stats AS (
SELECT avg(LENGTH(ROW(mytable.*)::text)+23)::NUMERIC(12,2) avg_row_size
	, COUNT(*) ROW_COUNT
FROM (
VALUES
 (1::INTEGER ,'RA'::text ,now()::DATE, 1::INTEGER)
,(2, 'SA', now()::DATE -1, 2)
,(3, 'C', now()::DATE -2, 4)
)  AS mytable (application_id, application_status, status_date, fk_status_id)
)
SELECT stats.*, (8192/avg_row_size)::NUMERIC(12,2) avg_rows_page, (avg_row_size * ROW_COUNT)::NUMERIC(12,2) data_size_bytes
FROM stats;

The “+23″ represents the normal PostgreSQL tuple overhead.
Add as many rows/columns as you think are necessary to create representative data for the table. Of course the length(row()) statement will return the size in character bytes (12345 is 5 bytes, not 4 or 8). When you get your hands on some actual data, you can refine the estimate by using the built-in PostgreSQL functions for data size.
The results will look something like this:

avg_row_size row_count avg_rows_page data_size_bytes
41.67 3 196.59 125.01

This could tell you whether a table is worth indexing (too small to bother or large enough to need indexes with quals) or it might estimate whether or not the table should participate in replication at some later date.

Simple multiplication can now tell you what the roughly expected data size will be for larger tuple counts.

Tags:

No Comments so far ↓

There are no comments yet...Kick things off by filling out the form below.

Leave a Comment

You must log in to post a comment.