PostgreSQL
Contents
Useful Links
What Every PostgreSQL User Needs to Know
- MySQL is probably the worst database written in the history of mankind, and that's saying a lot, because there have been some pretty crappy DB's written in the 70's.
- You NEED to read the documentation on configuring your PostgreSQL installation. Otherwise, it WILL be slow. It doesn't take a lot of work to understand what the configurations control and how to set them to the right numbers.
- You NEED to run
VACUUM ANALYZE
from time to time on your tables. You should setup a cron job to do so, or use one of the available tools to automatically vacuum tables when they need it. If you do only a vacuum, your queries will always be poorly planned. If you don't vacuum at all, then your entire database will crawl to a halt.
Other than that, you should be set to go.
Why I Love PostgreSQL
- PostgreSQL has a type system that works. It is extensible, it makes sense, and it is a pleasure to use.
- PostgreSQL allows you to program in several different languages in the database. You can also write your own extensions writing C.
- PostgreSQL's codebase is wonderful to work with.
- PostgreSQL doesn't have a rollback segment like Oracle.
- PostgreSQL is really, really scalable. That is, as the number of concurrent connections go up, it does not slow down exponentially like MySQL.
Why People Hate PostgreSQL
- They think PostgreSQL hasn't progressed any since 10 years ago.
- They think it is slow because they haven't configured it properly.
- They think it is hard because it doesn't try to hide the reality of ACID compliance from the user.
- They think it is hard because it is not MySQL, Microsoft SQL Server, or Oracle.
- They think it is dumb because you have to run vacuum from time to time.
- They think they know better than the planner because it prefers to run table scans rather than index lookups when you are faster doing a table scan anyway.
- They think the planner is dumb because they haven't analyzed the data to give the planner hints on how to plan queries yet.
- They think it is dumb and stupid because all their money and business is riding on the success of their favorite database which is costing them millions and doesn't perform nearly as well as PostgreSQL.
- They think it isn't enterprise ready because it doesn't require thousands of tools and books to keep the system running. (Apparently, enterprise-ready means you need a team of DBAs just to keep the thing alert, according to some people.)
- They think it is slow because PostgreSQL runs on top of the OS file partitions.
- They think that PostgreSQL users are snobby because they talk about data normalization, ACID compliance, and nested SELECT statements.
Why Doesn't PostgreSQL Use the Index?
The number 1 question people ask when using PostgreSQL is:
- Why doesn't PostgreSQL use the index?
The answer is one or more of the following:
- You didn't run VACUUM ANALYZE and so PostgreSQL has no idea how big your table is.
- Using the index is actually slower than a table scan for your particular query.
- Example: You have 5 rows in the table. Looking at 5 rows is faster than looking at the index and then looking at the 1 row you want.
- Example: You have 5,000.000 rows in the table. ANALYZE suggests that 3,000,000 rows would match that query. It's faster to run through the rows sequentially than randomly because that's the way hard disks work, and so it prefers a sequential scan.
- You messed up the configuration, telling PostgreSQL that sequential scans are much cheaper than they really are.
Cool Things
- hstore - A hash/dict/mapping datatype in PostgreSQL, with appropriate operators. http://www.sai.msu.su/~megera/postgres/gist/hstore/README.hstore
- My notes on using PostgreSQL as a local cache: PostgreSQL/Local Cache