news categories
Some of the newsfeeds can be viewed by category; choose one of the subjects below.Tech- & Geek news
News 4 (techs &) geeksTechnology news
Linux
Linux newsLinux Central headlines
Linux software announcements
Linux tips'n'tricks
Other OS'es
BEOS softwareBSD news
Mozilla
Mozilla relatedIT
SECURITYTelephony/VOIP
Networks
Internet Technology News
Webdev
WebdevelopmentWebdev::css
Webdev::javascript
WHO news
WHO newsfeedsDutch news
Dutch newsDutch weblogs
Dutch weblogsMore dutch weblogs
Podcasts
Dutch podcastsradiocast.nl
Planet PostgreSQL
url: http://planetpostgresql.orgSteve Singer: PGCon2012 ? Making Maps
Slides from my 2012 PGCon presentation Making your Own Maps are now available
The presentation covered
- Common reasons people render their own maps
- Where to get OpenStreetMap data and how to load it into your PostGIS database
- How to use Tilemill to design your own map style
- How to render map tiles, both statically and dynamically
- How to use OpenLayers to display your map
The presentation was recorded. I will update this post when the recording comes online.
I’ve been a regular attendee of PGCON since the first year it was held in Ottawa. Like past years I enjoyed the conference and I would like to thank Dan Langille for putting together another first-rate conference. My favourite part of PostgreSQL conferences is meeting and reconnecting with users people in the community.
Josh Berkus: Testing 9.2: Autovacuum Logging
Since PostgreSQL 9.2 is in beta now, I've been using some generously donated cloud server time on GoGrid to test various features in the new release. One of the ones which is working brilliantly is the new, more verbose logging for Autovacuum.
Greg Smith and Noah Misch added additional information to the logging you get when you set log_autovacuum_min_duration. This includes information on the work done by autovacuum (buffers, pages and tuples read and written), as well as information on when autovacuum "skips" because it can't get a lock. Here's a sample:
LOG: automatic analyze of table "bench2.public.pgbench_tellers" system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
LOG: automatic vacuum of table "bench2.public.pgbench_branches": index scans: 0
pages: 0 removed, 2 remain
tuples: 114 removed, 200 remain
buffer usage: 44 hits, 0 misses, 3 dirtied
avg read rate: 0.000 MiB/s, avg write rate: 45.334 MiB/s
system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
LOG: automatic analyze of table "bench2.public.pgbench_branches" system usage: CPU 0.00s/0.00u sec elapsed 0.02 sec
LOG: automatic analyze of table "bench2.public.pgbench_tellers" system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec
This will go a long way towards helping users troubleshoot autovacuum in the future. Great work, Greg and Noah!
Leo Hsu and Regina Obe: PostgreSQL 9.2 native json type support
One new welcome feature in PostgreSQL 9.2 is the native json support and companion row_as_json and array_as_json functions. PostGIS also has a json function for outputting geographies and geometries in GeoJson format which is almost a standard in web mapping.
Here is an example of how you'd use the new feature - create our test table
CREATE TABLE test(gid serial primary key, title text, geog geography(Point,4326));
INSERT INTO test(title, geog) VALUES('My neck of the woods', ST_GeogFromText('POINT(-71.057811 42.358274)'));
INSERT INTO test(title, geog) VALUES('some other neck of the woods', ST_GeogFromText('POINT(42.358274 -71.057811 )'));
Now with a command like this I can output all of my data as a single json object.
SELECT array_to_json(array_agg(t))
FROM test As t;
But there is a tincy little problem. My output looks like this:
[{"gid":1,"title":"My neck of the woods","geog":"0101000020E61000005796E82CB3C3
51C0E98024ECDB2D4540"}
,{"gid":2,"title":"some other neck of the woods","geog":"0
101000020E6100000E98024ECDB2D45405796E82CB3C351C0"}]
But to follow the geoJson standard, my geography object should output like this:
{"type":"Point","coordinates":[-71.057811000000001,42.358274000000002]}
Continue reading "PostgreSQL 9.2 native json type support"
Bruce Momjian: The Evolution of pg_upgrade
Pg_upgrade (documentation, presentation) was created in 1998 to allow in-place major version upgrades of Postgres. It was written as a shell script, as were many of our command-line administration tools at the time. (Postgres didn't support Windows natively until 2005). In those early years, Postgres changed its on-disk format frequently in major releases, causing pg_upgrade to be only occasionally useful for major version upgrades (6.5, 7.1, disabled in 2002).
Postgres feature additions soon made the shell script method unworkable, and binary-upgrades were unsupported for many years. In 2006, EnterpriseDB took on the job of rewriting pg_upgrade in C, a task that had been discussed many years on the mailing lists — this email thread, titled "Upgrading rant", give you an idea of the tone of previous discussions.
In 2009, EnterpriseDB realized that a successful binary-upgrade utility had to be community-developed, so they changed the license from GPL to BSD and encouraged me to work with the community to improve the tool. In 2010, pg_upgrade was added to the official Postgres 9.0 release. Postgres 9.1 saw only performance improvements. Postgres 9.2 will improve the user interface and error reporting. It is significant that pg_upgrade has not required large changes to support new major releases, only minor improvements.
Chris Travers: LedgerSMB now in Debian!
As Robert James Clay announced today:
The 1.3.15-2 package for LedgerSMB has been accepted into Debian
unstable. I expect that version of the package to migrate to Debian
testing (wheezy) by the end of this month.
LedgerSMB Debian source package page:
http://packages.qa.debian.org/l/ledgersmb.html
Besides becoming available in the Debian repositories, information
about downloading the current package manually is also available at:
http://packages.debian.org/sid/all/ledgersmb/download
Leo Hsu and Regina Obe: PostgreSQL 9.2 pg_dump enhancements
One of the things I'm excited about in PostgreSQL 9.2 are the new pg_dump section - pre-data, data, and post-data options and the exclude-table-data option. Andrew Dunstan blogged about this briefly in pg_dump exclude table data. What is also nice is that pgAdmin III 1.16 supports the section option via the graphical interface
. I was a bit disappointed not to find the exclude-table-data option in pgAdmin III interface though.
The other nice thing about this feature is that you can use the PostgreSQL 9.2 dump even against a 9.1 or lower db and achieve the same benefit.
The 9.2 pg_restore has similar functionality for restoring specific sections of a backup too.
So what is all this section stuff for. Well it comes in particularly handy for upgrade scripts. I'll first explain what the sections mean and a concrete example of why you want this.
- pre-data - this would be the table structures, functions etc without the constraints such as check and primary key and indexes.
- data -- it's uhm the data
- post-data - This is all constraints, primary keys, indexes etc.
Continue reading "PostgreSQL 9.2 pg_dump enhancements"
Hans-Juergen Schoenig: PostgreSQL operating system accounting ? CPU info
Once cgroups are in place and ready for action you can not just enjoy the kernel
side limitations which make sure that only a certain amount of hardware is used
but you can also make use of the Linux kernel?s massive analytical capabilities.
One major issue with any software is that everything which is managed by the
database itself can nicely be tracked. PostgreSQL provides some nice system
views to extract a vast amount of information. However, it is usually a bit hard
to track problems down to the kernel level and see what is really going on in
the operating system.
cgroups offer a simple and nice to use interface to grab information from the
OS. Basically information is exposed via a special filesystem. In our sample we
have mounted cgroups to /cgroup/main:
[root@lola main]# pwd /cgroup/main [root@lola main]# ls -l total 0 -r--r--r--. 1 root root 0 Apr 11 14:57 blkio.io_merged -r--r--r--. 1 root root 0 Apr 11 14:57 blkio.io_queued -r--r--r--. 1 root root 0 Apr 11 14:57 blkio.io_service_bytes -r--r--r--. 1 root root 0 Apr 11 14:57 blkio.io_service_time -r--r--r--. 1 root root 0 Apr 11 14:57 blkio.io_serviced -r--r--r--. 1 root root 0 Apr 11 14:57 blkio.io_wait_time --w-------. 1 root root 0 Apr 11 14:57 blkio.reset_stats -r--r--r--. 1 root root 0 Apr 11 14:57 blkio.sectors -r--r--r--. 1 root root 0 Apr 11 14:57 blkio.throttle.io_service_bytes -r--r--r--. 1 root root 0 Apr 11 14:57 blkio.throttle.io_serviced -rw-r--r--. 1 root root 0 Apr 11 14:57 blkio.throttle.read_bps_device -rw-r--r--. 1 root root 0 Apr 11 14:57 blkio.throttle.read_iops_device -rw-r--r--. 1 root root 0 Apr 11 14:57 blkio.throttle.write_bps_device -rw-r--r--. 1 root root 0 Apr 11 14:57 blkio.throttle.write_iops_device -r--r--r--. 1 root root 0 Apr 11 14:57 blkio.time -rw-r--r--. 1 root root 0 Apr 11 14:57 blkio.weight -rw-r--r--. 1 root root 0 Apr 11 14:57 blkio.weight_device -r--r--r--. 1 root root 0 Apr 11 14:57 cgroup.procs -rw-r--r--. 1 root root 0 Apr 11 14:57 cpu.cfs_period_us -rw-r--r--. 1 root root 0 Apr 11 14:57 cpu.cfs_quota_us -rw-r--r--. 1 root root 0 Apr 11 14:57 cpu.rt_period_us -rw-r--r--. 1 root root 0 Apr 11 14:57 cpu.rt_runtime_us -rw-r--r--. 1 root root 0 Apr 11 14:57 cpu.shares -r--r--r--. 1 root root 0 Apr 11 14:57 cpu.stat -r--r--r--. 1 root root 0 Apr 11 14:57 cpuacct.stat -rw-r--r--. 1 root root 0 Apr 11 14:57 cpuacct.usage -r--r--r--. 1 root root 0 Apr 11 14:57 cpuacct.usage_percpu -rw-r--r--. 1 root root 0 Apr 11 14:57 memory.failcnt --w-------. 1 root root 0 Apr 11 14:57 memory.force_empty -rw-r--r--. 1 root root 0 Apr 11 14:57 memory.limit_in_bytes -rw-r--r--. 1 root root 0 Apr 11 14:57 memory.max_usage_in_bytes -rw-r--r--. 1 root root 0 Apr 11 14:57 memory.memsw.failcnt -rw-r--r--. 1 root root 0 Apr 11 14:57 memory.memsw.limit_in_bytes -rw-r--r--. 1 root root 0 Apr 11 14:57 memory.memsw.max_usage_in_bytes -r--r--r--. 1 root root 0 Apr 11 14:57 memory.memsw.usage_in_bytes -rw-r--r--. 1 root root 0 Apr 11 14:57 memory.move_charge_at_immigrate -rw-r--r--. 1 root root 0 Apr 11 14:57 memory.soft_limit_in_bytes -r--r--r--. 1 root root 0 Apr 11 14:57 memory.stat -rw-r--r--. 1 root root 0 Apr 11 14:57 memory.swappiness -r--r--r--. 1 root root 0 Apr 11 14:57 memory.usage_in_bytes -rw-r--r--. 1 root root 0 Apr 11 14:57 memory.use_hierarchy -rw-r--r--. 1 root root 0 Apr 11 14:57 notify_on_release -rw-r--r--. 1 root root 0 Apr 11 14:57 release_agent -rw-r--r--. 1 root root 0 Apr 11 14:57 tasks drwxr-xr-x. 2 root root 0 Apr 20 03:25 user_0004 drwxr-xr-x. 2 root root 0 Apr 20 03:25 user_0005
We can easily extract this information using a simple ?cat?:
[root@lola main]# cat cpuacct.usage_percpu 19745669099020 17321579548118 16193177855463 12703461924936
In this specific example we have extracted information about CPU usage. As you
can see we got 4 CPUs in our system. Each number will give us the amount of CPU
time burned by every single CPU core. To monitor stuff you can pump this
information into a monitoring software of your choice.
In our example we have created two cgroups ? one for every PostgreSQL instance
on the system. In side those user_* directories you will find similar
information again:
[root@lola user_0004]# cat cpuacct.usage_percpu 61114810788 53897888870 54165636817 24571480812
Those numbers only represent the CPU time burned by our specific instance. It
allows us to nicely drill down to the information needed. Using cgroups you can
see which component of your system needs how many resources. Thus, excellent
information for debugging and error checking is available. From inside the DB it
is usually impossible to tell what was done by which CPU and when using how much
I/O and so on.
Peter Eisentraut: Base backup compression options
I've been looking at my PostgreSQL base backups. They are run using the traditional
tar -c -z -f basebackup.tar.gz $PGDATA/...
way (many details omitted). I haven't gotten heavily into using pg_basebackup yet, but the following could apply there just as well.
I had found some of the base backups to be pretty slow, so I dug a little deeper. I was surprised to find that the job was completely CPU bound. The blocking factor was the gzip process. So it was worth thinking about other compression options. (The alternative is of course no compression, but that would waste a lot of space.)
There are two ways to approach this:
Use a faster compression method.
Parallelize the compression.
For a faster compression method, there is lzop, for example. GNU tar has support for that, by using --lzop instead of -z. It gives a pretty good speed improvement, but the compression results are of course worse.
For parallelizing compression, there are parallel (multithreaded) implementations of the well-known gzip and bzip2 compression methods, called pigz and pbzip2, respectively. You can hook these into GNU tar by using the -I option, something like -I pigz. Alternatively, put them into a pipe after tar, so that you can pass them some options. Because otherwise they will bring your system to a screeching halt! If you've never seen a system at a constant 1600% CPU for 10 minutes, try these.
If you have a regular service window or natural slow time at night or on weekends, these tools can be quite useful, because you might be able to cut down the time for your base backup from, say 2 hours to 10 minutes. But if you need to be always on, you will probably want to qualify this a little, by reducing the number of CPUs used for this job. But it can still be pretty effective if you have many CPUs and want to dedicate a couple to the compression task for a while.
Personally, I have settled on pigz as my standard weapon of choice now. It's much faster than pbzip2 and can easily beat single-threaded lzop. Also, it produces standard gzip output, of course, so you don't need to install special tools everywhere, and you can access the file with standard tools in a bind.
Also, consider all of this in the context of restoring. No matter how you take the backup, wouldn't it be nice to be able to restore a backup almost 8 or 16 or 32 times faster?
I have intentionally not included any benchmark numbers here, because it will obviously be pretty site-specific. But it should be easy to test for everyone, and the results should speak for themselves.
Devrim GÜNDÜZ: Freshly baked, served directly from oven: Please welcome 9.2 RPMs
I was at PGCon this week. I will try to write a separate blog post about this -- but anyway, an advantage of being at a conference that is thousands of miles away from your home is you are not being disturbed at all -- so I took this opportunity to finish 9.2 RPMs.
So, here is the big, detailed, extra long announcement of 9.2 RPMs
As of 9.2, I am dropping support for RHEL/CentOS 4, since Red Hat already EOLed RHEL 4. Similarly, 9.2 will only be available for Fedora 15 and above. I will keep maintaining older branches, until we EOL 9.1
There are probably still some bugs -- let me know if you find one!
Joe Abbate: PostgreSQL Extensions and Pyrseas
Prompted by Peter Eisentraut’s blog post, I’ve finished adding support for PG 9.1 EXTENSIONs to the Pyrseas dbtoyaml and yamltodb utilities. For now, this is only available on GitHub.
In order to deal with procedural languages, which are now created as extensions, the utilities now fetch the pg_catalog schema (previously deemed uninteresting for the purpose of version control). The output of dbtoyaml from a freshly created 9.1 database (assuming no customizations via template1) is now:
schema pg_catalog: description: system catalog schema extension plpgsql: description: PL/pgSQL procedural language version: '1.0' schema public: description: standard public schema
This could be changed easily to exclude pg_catalog (which will now also appear against 8.4 and 9.0 databases) before the next Pyrseas release.
I’m hoping some brave, adventurous or simply interested souls will help test the additions. Please report any issues on GitHub.
Filed under: PostgreSQL, Version control
Bruce Momjian: Report from PGCon
PGCon 2012 is now over. Surprisingly, all the talks that excited me were from the second/last day. My morning started with WAL Internals Of PostgreSQL (slides) by Amit Kapila. I rarely see this topic covered, and his comparisons to Oracle's transaction logging were fresh. Also in the morning, Hooks in PostgreSQL by Guillaume Lelarge: I also rarely see this covered. It highlighted the C-language plug-in capabilities of Postgres.
My afternoon highlight was Big Bad "Upgraded" PostgreSQL by Robert Treat. After the occasional maligning of pg_upgrade (often in jest), it was great to hear a big pg_upgrade success story. (This OmniTI blog entry has the details. This talk was also presented at PG Corridor Days DC.)
Andrew Dunstan: Productive Hallway Track
One of the best things about conferences is that you can often deal with some problems quite quickly. For example, some people were unhappy about having to supply a dummy dynamic library while doing a binary upgrade from the JSON 9.1 backport. In discussion with Bruce Momjian he and I quickly worked out that this is really a deficiency in pg_upgrade, which should not be looking at libraries used by functions in pg_catalog at all, since these won't be dumped by pg_dump. So we'll be able to deal with this case very simply. From a quick look this seems like a one line fix in pg_upgrade. Bruce and I also worked out some details of how we could set up automated cross-version testing of pg_upgrade via the buildfarm, and this should be working some time later this year. All in all this was a very useful 15 minutes or so.
Jignesh Shah: PgCon 2012: OLTP Performance Benchmarks Overview
Christophe Pettus: Running PostgreSQL on AWS
My presentation from PGCon 2012, PostgreSQL on AWS with Reduced Tears, is now up.
Andrew Dunstan: pgbadger looks cool
The biggest thing that piqued my interest yesterday at pgCon was the lightning talk on pgbadger. I'm certainly going to take a look at this as a substitute for pgfouine, which I have sadly found largely unusable in most cases where I want it.
Bruce Momjian: Parallel Query Report from the PGCon Developer Meeting
As part of yesterday's PGCon Developer Meeting, I hosted a discussion about adding resource parallelism to Postgres, which I blogged about previously. Josh Berkus has kindly summarized the results of that discussion, and I am hoping to help track and motivate progress in this area in the coming months.
Andrew Dunstan: Adding enum labels isn't transactional
Back when we added a facility to add new labels to an enum, some technical considerations made us adopt a restriction that this must not take place within a transaction.block. David Wheeler has been complaining about this lately, including cornering some of us in the pub last night. His chagrin is understandable, since all the other changes in his migration scripts.are transactional and can be rolled back. Unfortunately this isn't something that is easy to overcome. What we can do, and should for 9.3, is to make the label addition command more resilient by adding an "if not exists" option. That way if the label has already been added to the type the command won't generate an error.
Guillaume LELARGE: pgAdmin 1.16 beta 1 is out
At the same time PostgreSQL 9.2 beta 1 was released, so did pgAdmin 1.16 beta 1 (with an unfortunate copy/paste error on the URL to download beta 1).
We don't have yet installers for Windows, and packages for Mac OS X. They will come at a later time.
What you can expect from pgAdmin 1.16 is complete compatibility with PostgreSQL 9.2. All the new features of PostgreSQL 9.2 are handled by pgAdmin 1.16. You can use pgAdmin 1.14 with PostgreSQL 9.2, but you won't get all the new parameters, and objects. Moreover, you'll have issues with the tablespaces node (one column disappears in the pg_tablespace system catalog).
One thing you shouldn't expect from pgAdmin 1.16, and I'm really sorry about that, is the database designer. This really cool tool isn't ready for prime time yet. We still need more time to fix the bugs, to make sure it's usable on every plateform, and to do all the little things that make a product great. You can still try it if you want, but you'll need to compile pgAdmin (look for the --enable-databasedesigner ./configure option).
I'm working right now on a visual tour for 1.16, but it may take some time before it gets out.
During the beta period, I'll also fix the bugs testers will encounter, and work on the documentation.
So, as you see, still a lot of work. But don't hesitate to install pgAdmin 1.16 beta 1, search for bugs, and report them on our mailing lists, so that we can have a really great 1.16.0 release.
Bruce Momjian: Template1 Me!
You might have heard of the template1 database before, or seen it in the output of pg_dumpall:
REVOKE ALL ON DATABASE template1 FROM PUBLIC; …
Jeff Frost: Slony Upstart Script for Ubuntu 10.04 LTS
Since Ubuntu is moving towards using upstart instead of SysV init scripts, I thought I'd write a quick upstart script for starting slony compiled and installed in /usr/local/slony. Upstart 1.4+ supports setuid, but the version of upstart included with Ubuntu 10.04LTS is only 0.6.5 and even on Ubuntu 11.10, it's only 1.3, so instead I use start-stop-daemon's --chuid switch. The script should be installed as /etc/init/slony.conf. Here's the script:
---cut here---
# slony - slon daemon
#
# slon is the slony replication daemon
description "slony slon daemon"
start on net-device-up
stop on runlevel [06]
# can't use setuid until upstart 1.4
#setuid slony
# which is why we use start-stop-daemon with the --chuid switch
respawn
script
exec start-stop-daemon --start --chuid slony --exec /usr/local/slony/bin/slon -- -f /etc/slon.conf >> /var/log/slony/slony.log 2>&1
end script
---cut here---
This will have the slon daemon run as the slony user and log to /var/log/slony/slony.log. You'll also need a valid slony config file in /etc/slon.conf.
You can also find the script in the pgexperts upstart-scripts git repo:
https://github.com/pgexperts/upstart-scripts
Peter Eisentraut: My (anti-)take on database schema version management
There were a number of posts recently about managing schema versions and schema deployment in PostgreSQL. I have analyzed these with great interest, but I have concluded that they are all more or less significantly flawed. (Of course, most of these solutions do in fact work for someone, but they are not general enough to become canonical go-to solutions for this problem class, in my opinion.) I have developed a list of elimination criteria by which I can evaluate future developments in this area. So here are some of the things that I don't want in my schema version management system:
Using schemas for distinguishing multiple versions (like this, but that's actually more about API versioning). That simply won't work for deploying objects that are not in schemas, such as casts, languages, extensions, and, well, schemas.
Using extensions (like this). Well, this could work. But extensions by themselves do nothing about the core problem. They are just an SQL wrapper interface around upgrade scripts. You still need to write the upgrade scripts, order them, test them, package them. The extension mechanism might replace the, say, shell script that would otherwise run the upgrade files in a suitable order. Another issue is that extensions require access to the server file system. Changing this is being discussedas "inline extensions", but there is no consensus. This is a smaller problem, but it needs to be thought about. Also, I do need to support PostgreSQL 9.0 and earlier for little while more.
Requiring naming each change (patch names, like this). Naming things is hard. Numbering things is easy. And how many good names are you going to still be able to come up with after 100 or so changes?
Take a lesson from file version control systems: versions are numbers or, if it must be, hashes or the like (UUIDs have been suggested).
Using a version control tool for tracking upgrade paths (like this). Sqitch, unlike the initial draft of this concept, doesn't actually require a version control tool for deployment, which wouldn't have worked for me, because what we ship is a tarball or a deb/rpm-type package. But it still requires you to maintain some kind of sanity in your version control history so that the tool can make sense out of it. That sounds fragile and inconvenient. The other choice appears to be writing the plan files manually without any VCS involvement, but then this loses much of the apparent appeal of this tool, and it's really no better than the "naming each change" approach mentioned above.
Taking snapshots or the like of a production or staging or central development system. Production systems and staging systems should be off limits for this sort of thing. Central development systems don't exist, because with distributed version control, every developer has their own setups, branches, deployments, and world views.
You could set it up so that every developer gets their own test database, sets up the schema there, takes a dump, and checks that in. There are going to be problems with that, including that dumps produced by
pg_dumpare ugly and optimized for restoring, not for developing with, and they don't have a deterministic output order.Storing the database source code in a different place or in a different manner than the rest of the source code. This includes using a version control system like mentioned above (meaning storing part of the information in the version control meta information rather than in the files that are checked into the version control system in the normal way), using a separate repository like Post Facto, or using something like the mentioned staging server.
The source is the source, and it must be possible to check out, edit, build, test, and deploy everything in a uniform and customary manner.
Allowing lattice-like dependencies between change sets (like most examples cited above). This sounds great on paper, especially if you want to support distributed development in branches. But then you can have conflicts, for example where two upgrades add a column to the same table. Depending on the upgrade path, you end up with different results. As your change graph grows, you will have an exploding number of possible upgrade paths that will need to be tested.
There needs to be an unambiguous, canonical state of the database schema for a given source checkout.
Requiring running through all the upgrade scripts for a fresh deployment (like this). There are two problems with this. First, it's probably going to be very slow. Fresh deployments need to be reasonably fast, because they will be needed for automated tests, including unit tests, where you don't want to wait for minutes to set up the basic schema. Second, it's inefficient. Over time, you might drop columns, add new columns, delete rows, migrate them to different tables, etc. If you run through all those upgrade scripts, then a supposedly fresh database will already contain a bunch of rubble, dropped attributes, dead rows, and the like.
Therefore, the current version needs to be deployable from a script that will not end up replaying history.
Using metalanguages or abstraction layers (like Pyrseas or Liquibase or any of the metaformats included in various web frameworks). It'd probably a good idea to check some of these out for simple applications. But my concern is whether using an abstraction layer would prevent me from using certain features. For example, look at the Pyrseas feature matrix. It's pretty impressive. But it doesn't support extensions, roles, or grants. So (going by that list), I can't use it. (It's being worked on.) And in a previous version, when I looked at it for a previous project, it didn't support foreign-data functionality, so I couldn't use it then either. And those are just the top-level things the author thought of. Actually, the Pyrseas author has gone through some effortto have almost complete coverage of PostgreSQL DDL features, so give this tool a try. But it won't be for everyone.
So, I'm going to keep looking.
Magnus Hagander: Call for Papers - PostgreSQL Conference Europe 2012
The call for papers for PostgreSQL Conference Europe 2012 in Prague, the Czech Republic has now been posted. As usual, we are looking for talks on all topics related to PostgreSQL. At this point, we are looking for submissions for regular conference sessions - we will post a separate call for papers for lightning talks at a later time.
We are also still looking for sponsors - please see our website for details about the sponsor benefits and the costs.
Follow the news feed on our site, or our Twitter feed, for news updates!
David Wheeler: Use of DBI in Sqitch
Sqitch uses the native database client applications (
psql,sqlite3,mysql, etc.). So for tracking metadata about the state of deployments, I have been trying to stick to using them. I’m first targeting PostgreSQL, and as a result need to open a connection topsql, start a transaction, and be able to read and write stuff to it as migrations go along. The IPC is a huge PITA. Furthermore, getting things properly quoted is also pretty annoying — and it will be worse for SQLite and MySQL, I expect (psql’s--setsupport is pretty slick).
Martin Pitt: Debian/Ubuntu Packages for PostgreSQL 9.2 Beta 1
The first Beta of the upcoming PostgreSQL 9.2 was released yesterday (see announcement). Your humble maintainer has now created packages for you to test. Please give them a whirl, and report any problems/regressions that you may see to the PostgreSQL developers, so that we can have a rock solid 9.2 release.
Remember, with the postgresql-common infrastructure you can use
pg_upgradeclusterto create a 9.2 cluster from your existing 8.4/9.1 cluster and run them both in parallel without endangering your data.For Debian the package is currently waiting in the NEW queue, I expect them to go into experimental in a day or two. For Ubuntu 12.04 LTS you can get packages from my usual PostgreSQL backports PPA. Note that you need at least postgresql-common version 0.130, which is available in Debian unstable and the PPA now.
I (or rather, the postgresql-common test suite) found one regression: Upgrades do not keep the current value of sequences, but reset them to their default value. I reported this upstream and will provide updated packages as soon as this is fixed.
Hans-Juergen Schoenig: PostgreSQL resource consumption limitations
PostgreSQL offers a great deal of flexibility when it comes to resource
allocation and efficient usage of hardware capacities. This allows fancy
monitoring of what PostgreSQL does in production. Sometimes, however, better
operating system integration is needed and helps to use hardware even more
efficiently.
On Linux one way to limit or allocate resources is to use the Linux kernel?s
cgroup interface. cgroups and PostgreSQL are a perfect team for ?
- more efficient monitoring
- restriciting use of resources
- resource accounting.
How do cgroups work?
On a Linux system cgroups (= control groups) are basically a way to group
various operating system processes into groups. A group like that can then be
assigned to a certain amount of RAM, a fraction of a CPU or to a certain number
of I/O operations.Processes can be moved to a group manually or with the help of a daemon which
moves processes to the right group automatically given some user-defined rules.
It is easily possible to move all processes belonging to a user to some special
cgroup and apply constraints to it. Following this approach you can move an
entire PostgreSQL instance and all its processes to a certain cgroup.A simple config could look like that (/etc/cgconfig.conf):
mount { cpu = /cgroup/main; cpuacct = /cgroup/main; memory = /cgroup/main; blkio = /cgroup/main; } group user_0004 { perm { admin { uid = root; gid = root; } task { uid = user_0004; gid = user_0004; } } memory { memory.limit_in_bytes = 256M; } cpu { cpu.shares = "75"; } } group user_0005 { perm { admin { uid = root; gid = root; } task { uid = user_0005; gid = user_0005; } } memory { memory.limit_in_bytes = 512M; } cpu { cpu.shares = "125"; } }
We got two users which form separate groups. Each group will get certain
fractions of the system?s CPU as well as a certain amount of memory. In other
words: You can box a database instance nicely inside a group.In the next step you can nicely put processes into a group. This is best done
using /etc/cgrules.conf:
user_0004 * /user_0004 user_0005 * /user_0005
In our case every user?s process is automatically moved to the right process on
startup.More on cgroups will be available in further blog posts.
Michael Paquier: Postgres-XC 1.0beta2 is released
Postgres-XC, write-scalable multi-master symmetric cluster based on PostgreSQL, version 1.0beta2 has been released. This beta version is based on PostgreSQL 9.1.3. All the patches in PostgreSQL 9.1 stable branch have been merged up to commit 1c0e678 (4th of May 2012). You can download the tarball directly from here. This tarball contains all the HTML and [...]
Chris Travers: Thoughts on VoltDB and performance vs flexibility
VoltDB is the latest project of Mike Stonebreaker (original founder of Postgres) aimed at very high performance OLTP. It is fairly clearly a niche product however, and not really usable in the ERP space for reasons I will discuss below.
Stonebreaker's own presentations (one example, at 11:25) suggest he is aware of this issue given his triangle framework.
In the vertices of the triangle he places NoSQL, high performance OLTP (like VoltDB) and column-oriented databases, while in the second, as lower performance generalists, what he calls the legacy databases or elephants are in the middle. The argument is that you can get a significant performance boost by specializing your systems beyond what you could when you have every system doing everything. This is a fair argument to a point, but the implementation details show that it is true only in some areas.
Stonebreaker's analysis of processing time (see above presentation, at 14:36) in traditional databases places virtually all processing time in four areas, namely buffer management, locking, thread locks/semaphores, and recovery tasks, and he suggests that in order to get high degrees of performance one must eliminate these tasks. This requires, however, rethinking how we address concurrency (the locking issues) and durability (recovery and disk-based storage). The VoltDB approach is to get rid of concurrency entirely and see durability as a network, rather than a system, property. Concurrency elimination accepted because the db is fast enough that queries can be run one at a time, and intraquery parallelism can be used instead of interquery parallelism. However, this imposes significant limitations on the database because it means that every transaction is limited to a single query. You can't do round-tripping in your transactions because this would impose locking requirements on the database.
This approach works very well for certain kinds of processing, such as consuming rapid data feeds and then feeding that information into a data warehouse at specified intervals., However one thing that is generally missing from the discussion is that the more complex the application, the more general the database solution needs to be. One solution might be to separate data entry from reporting and use multiple tools, but who wants to pull their trial balance from a different system than they enter invoices on? Since ERP systems usually tightly integrate decision support and OLTP, there isn't really room here to use specialized databases like VoltDB.
As the database market has expanded, it has created niches for products like VoltDB. These niches may indeed grow with time. However, I think it is incorrect to treat generalists as legacy, outdated approaches.
In essence VoltDB typifies what I call a specialism paradox, namely that to perform well in a specialist niche one must give up generalist solutions. It is true that sometimes things improve and supplant older systems but at least as often they only do when the systems are not vastly more complex than the ones they are replacing. For example, Stonebreaker's comparison of high performance OLTP to the shift from cable-operated digging excavation equipment to hydrolics misses the fact that hydrolic systems are not that much more complex than cables and such. There is some additional complexity, but it isn't a huge gap. A better comparison might be comparing semi trucks to race cars and locomotives. These are all highly engineered systems, but the basic design considerations are very different, and specialization comes at the cost of flexibility. You will never see race cars hauling twenty-foot containers of furniture cross-country, and you will never see travelling along our nation's highways to reach out-of-the-way towns. The semi however will never pull the loads that the locomotive can, and will never win the Indie 500...... The semi truck, however, can go fast enough, and pull the loads it needs to pull.....
Andrew Dunstan: Upgradable JSON
If you're using my backport of the 9.2 JSON feature in your 9.1 Postgres, you have a problem if you want to use pg_upgrade when moving to release 9.2: the type has the same name and behaviour in 9.1 as in 9.2, but the OIDs are different and so the upgrade will fail. If you use pg_dump/pg_restore, you should just be able to exclude the restoration of the extension and everything will "just work".
But after a lot of trial and error I have come up with a different way to do things. In this scenario we don't install as an extension, and instead we borrow a bit of magic from pg_upgrade to make sure that the JSON type and its array type have the same OIDs as they will have in PostgreSQL 9.2. We also need to fool pg_upgrade when it runs, as it goes looking for a loadable module called json.so, and of course in 9.,2 there isn't one because JSON is built in. So we need to supply a dummy one. But, given that, this too "just works".
There might be some wrinkles I haven't foreseen, but I have done this quite a few times now without mishap. This is checked in on a separate branch of the json_91 repo callled "binup"
If you are going to use this backport, I also highly recommend using the backport of the fix for missing column names on row expressions. This requires patching Postgres, but it's worth doing anyway, even if you're not using JSON - notably it affects some uses of hstore as well. You can get this by pulling the port into a git repo set at REL9_1_STABLE by doing:Enjoy.git pull https://bitbucket.org/adunstan/pgdevel.git rowexpr91
Hitoshi Harada: PL/coffee Trial
We are getting to the final stage of the first stable release of PL/v8. This should be a good release with a lot of improvement such like
- Subtransaction support
- Better name space of built-in functions
- OO style cursor, prepared plan
- find_function
- Start-up procedure with GUC
- Separate context in user switch
- A series of bug fixes.
And these days I'm getting increasing number of interests and feedback via email, SNS, plv8 project page. Upcoming PGCon will at least have two talks that may cover the usage of PL/v8.
In the mean time, there was a long standing feature request of PL/coffee, a procedural language in CoffeeScript, the dialect of JavaScript. CoffeeScript is only a source-source transformation, and the compiler is provided as a tiny JavaScript, so I tried to let the compiler to transform plv8 source if the language is "plcoffee".
=# CREATE OR REPLACE FUNCTION public.fibonacci(n integer) RETURNS integer LANGUAGE plcoffee IMMUTABLE STRICT AS $function$ fibonacci = (x)-> return 0 if x == 0 return 1 if x == 1 return fibonacci(x-1) + fibonacci(x-2) return fibonacci n $function$; CREATE FUNCTION =# select fibonacci(10); fibonacci ----------- 55 (1 row)As I noted above, this is only a source transformation, so the engine is still v8, and the runtime environment and other stuff are shared with plv8. Of course it can reference plv8 function via find_function(). The plcoffee EXTENSION is separate and by default it's off. If you are interested in it, clone the latest source and say
make ENABLE_COFFEE=1 installIt works and it attracts more interest from the people around some area. However, it seems it is a little controversial to use CoffeeScript (of course not in the database usage context, but in the web browser and server context) that the source transformation is quite difficult to debug since the problems like run time error emitted by the JS engine is far from the actual script source, which is so annoying. And since I embed the compiler JS source file as a object file symbol, the size of plv8 shared object gets from around 200k byte to 600k byte. Thus, for now I leave it as an "experimental" feature and want to see if it gets popular or not.
Bruce Momjian: Postgres 9.2 Draft Release Note Published
I have completed the Postgres 9.2 release notes I started seven days ago. Writing the release notes is always a taxing experience. I have to basically dedicate an entire week to the process of digesting 1100 commit messages to produce 3000 lines of SGML text. Once I am done though, it is rewarding to see the finished product. It is like working on a painting all year, and spending a frustrating week framing it and hanging it on the all — once it is hung, you stand and admire it, and forget much of the sweat it took to produce.
![]()
Curious about how the 9.2 release item count compares to previous major releases? Here are the results:
Release Items 9.2 245 9.1 213 9.0 252 8.4 330 8.3 237 8.2 230 8.1 187 8.0 238 7.4 280

