Not quite open source

I found a couple of posts over on Planet MySQL today about some new announcement from the dolphin-herders. Apparantly, the Sun/MySQL combination is now starting to push at least some features for Enterprise customers only. AFAIK this isn't entirely new, since they've had some tools previously only available to enterprise customers, but I guess this would be a first for actual server code. (Though the way patches are moved between Enterprise and Community never really made sense to me, that's probably because I didn't study it in enough detail)

From what I can tell, this puts MySQL more towards the situation that PostgreSQL has with EnterpriseDB (and their now renamed product Postgres Plus Advanced Server) and some other companies. This is a proprietary product that sits on top of an open source foundation that is PostgreSQL. Now, EnterpriseDB initially marketed this product as opensource - which was incorrect, and they got quite a bit of bad press about it. The question is, what will Sun/MySQL do now. Will they continue to market their Enterprise product as Open Source, which it obviously isn't, or will they change that?

It is also interesting to note that this would not be possible if MySQL was licensed as pure GPL. With the current model, this is only possible for Sun/MySQL to do, and no other company. Unlike PostgreSQL which is BSD licensed, and allows any company to create this kind of proprietary derived product. If I was a contributor to MySQL (which I'm not), I would consider this quite unfair. But that's what you get with the dual licensing model used by MySQL, and this is why contributors have to sign away their rights to Sun/MySQL. It's not surprising to me that this happens, and it's one of the reasons I haven't liked the MySQL licensing model from the start. The other side of the coin would be a pure GPL license, which would not allow anybody at all to create a proprietary derived product - but that's a whole different debate to take on...

So is this a dealbreaker? Well, I don't know exactly which features they are talking about in this case (since I'm not a Sun/MySQL partner), but there was talk of it being things around backups. IMHO, such extremely core functionality really should be in the main product - which is how PostgreSQL has it today. But more importantly, if this is the new path Sun/MySQL is going to be heading down with their products, I can see how that will be a problem for a number of their customers.

And if you're one of them - you're always welcome to cross over and join us over here by the elephant...

PostgreSQL Conference East - day 2

So I'm here at PostgreSQL Conference East in College Park, MD. Didn't have time to do any blogging yesterday, so someone managed to blog about it before me... JD somehow managed to make me one of the semi-official camera-in-your-face people (photographer, if that wasn't painfully clear), so I've also been running around getting a bunch of pictures so far. They'll be up on my galleries later when I've managed to look through them a bit to remove the most crappy ones - and I'll try to get them onto the shared photo stream as well.

So far, the conference has been very good. Most talks I've been to are really high quality and interesting stuff, and the organization has worked very well (except possibly for Bruce sending Denis off to the wrong end of campus, but he made it in time for dinner so things ended up fine).

Starting at the day before yesterday, there was a "small evening event" with food and drinks at Chevys. There were quite a number of people there for being a "small" event... But it was all good and fun. JD managed to convince the staff at the place that it was David Fetters birthday, so he got both singing, icecream (I think it was) and a sombrero... During the panel today it was confirmed that his actual birthday is sometime in october, so it apparantly wasn't even close.

http://www.smugmug.com/photos/272240753_DjKS4-S.jpg alt="Josh Drake keynote" align="right" hspace="2" vspace="4" /Yesterday started with Josh Drake's keynote about the state of the community. I followed that up with a talk about PostgreSQL from Java, which was a good set of information for a total Java newbie (that's me!) Last talk before lunch was "Inside the PostgreSQL Buffer Cache" by Greg Smith. A very useful talk about measuring and tuning this very central part of PostgreSQL. I expected it to be interesting, but I'm also taking away some very direct hints on what to look at and how to interpret it that I didn't really expect.

After lunch I started with Greg Sabino Mullanes talk about the different replication options available for PostgreSQL, which also got some pretty good discussion going. Followed that up with "logic and databases", which I'll admit I spent about half of in an interesting discussion i the hallway instead. But I'm sure it was good. Finished off the day with "Postmodern PostgreSQL application development". That turned out to be mainly about ORM in Python, which honestly didn't really interest me, but I did take away some good tips about python/qt which I will obviously have to look into a bit in the future. And it did create a bit of interesting discussion with David on the way back to the hotel later.

http://www.smugmug.com/photos/272295095_razSs-S.jpg alt="EDB sponsored dinner" title="There is a story behind this situation. But that will remain between those present..." align="left" vspace="4" /After stopping by the hotel to drop off some things, it was off to the EnterpriseDB sponsored dinner and drinks party at Jasper's. Good food, decent drinks (c'mon, this is still america, you have to actually search harder to find a place with good beer options), and some good discussion with both new and "old" people. Oh, and David Fetter had a birthday again so the staff got to sing for him...

So far today we started with the panel discussion which was decent, I'd say. It could've been better, but it could also have been a lot worse. So no complaints in general. After that, we did a proper group photo (it will be available on the website as usual later, with names on it). Right now I'm in "Portable Scale-out Benchmarking for PostgreSQL" with Robert Hodges. Did a quick stop in the XML talk, but since I'm not really interested in that stuff right now, I moved on.

FOSDEM day 2

...is not over yet. There are still things to go. But I've finished off my two talks (one on "win32 development in a unix project" (no, there was no code at all in this) and one on "building search.postgresql.org" (very little code in this one). I think they went reasonably well, we'll see what others say when I check around some more...

Not quite as many people came to our booth today, at least not the time I was there, but there were still a lot of people. We had more talks instead, so I'd still classify the day as a big success.

Gabriele kicked off in the morning with an interesting talk, that unfortunately didn't have as many visitors as it deserved, because it was early in the morning (9AM).

Bruce had a talk about the PostgreSQL future, which probably wasn't what people expected. I think most people were expecting a roadmap for 8.4, which isn't at all what we received - but it was still a good talk.

Susanne did her encoding talk and a talk on how people can contribute to the PostgreSQL community. Unfortunately, I missed both of those. David also had a talk about DBI-link that I also missed - some people had to man the booth, and since there was no scheduled time for lunch some talks simply had to be skipped.

Simon has had one talk, focusing on enterprise features that he will be working on to get into the next version(s) of PostgreSQL, and he just started his second talk which is on the topic of high availability. He also arranged a short "roundtable discussion" with the BSD folks on what they could do in the OS to help PostgreSQL. Nothing concrete from such a short meeting of course, but hopefully it laid some foundation for more work in the future.

We had an auction for one of the big elephants, which really didn't end up where we expected it - because we had only a single bidder. Probably the timing could've been a lot better since there were talks going on - we need to think about that one next time.

In about half an hour it's time for the vote for the first board of directors for the new PostgreSQL Europe organization. Going to be interesting to see how that works out, and of course what the end result of it is.

Good times at FOSDEM

It's now the morning of the second day of FOSDEM, and my third day in Brussels. Right now, we have a talk from Gabriele Bartolini about the Italian PUG. Yesterday was packed with several good talks from a lot of different people, and a really packed booth. I've never seen a pg booth that had this many people or this much stuff (flyers, folders, t-shirts, mugs, those lovely elephants, pins, etc etc). Lots of visitors, and lots of interesting questions.

Outside the conference, I arrived on thursday evening, and was picked up at the airport by Susanne (thanks a bunch!). Met up with Bruce and Luke Momjian at the hotel, and went to dinner with a couple of MySQL guys (one which happened to be Susannes boyfriend, which I guess explains some of the connection there..). On friday we did some touristing around Brussels, and I took the chance to do some pg work while Bruce fell asleep more or less in the middle of a sentence - I guess we blame jetlag... Susanne was off to pick up Stefan at the airport, but since he gave her his departure time instead of arrival time, the day ended up being a bit delayed...

For the evening, we met up with a whole lot of other people (Dave, Greg, Heikki, Simon, Gevik, David, and several more) for our own beer event since the experience from previous years says that the big FOSDEM one is just too crowded. Had dinner at the same place.... Food and drink was good, but it was a bit of a boring atmosphere. On the other hand, the company was good and the discussions interesting...

Second day had us setting up our two tables in the booth in the morning, stacking it with all our nice merchandise. We sold a lot of things during the whole day, and pretty much ran out of the cool plush elephants. Most of the time the booth was packed with both PostgreSQL people manning it and visitors interested in talking to the PostgreSQL people, reading through our flyers, or buying our stuff.

In the afternoon, the talks in the devroom started. David Fetter was off with the keynote which was good, but some of it was lost because he never got the projector working properly from his laptop. This was followed up by an interesting talk by Stefan on the PostgreSQL infrastructure (ok, I knew most of that, but it was still good!). Simon gave a talk on some of the new performance features in 8.3, and he really could've used much more time to go through more of them - there are just so many things to talk about in that area. Bruce finished off the days pg talks by a repeat of his "Decade of PostgreSQL" talk that he did in Toronto - always a good talk.

In the evening we filed out to a restaurant fairly close to the hotel - the biggest problem was finding one that actually had space for all of us. We ended up at an Italian place that actually also served some Belgian specialties. Some people went home from that, and the rest of us ended up having some beers and chatting about PostgreSQL, The World and Everything.

Now off to focus on todays talks. And get ready for my own, which is later on during the day.

PostgreSQL vs 64-bit windows

Several completely independent people have asked me this in email already, so I figured it's about time for a blog post on the topic of PostgreSQL on 64-bit versions of Windows.

First for the simple answer: No, there is no 64-bit version of PostgreSQL for Windows. PostgreSQL has supported 64-bit environments on Unix for many years (long before we had x64 to make it available for wintel machines), but there is no Win64 port. Yet. And given the way that PostgreSQL is developed, there is no firm date for when this will be available.

Until then, though, the 32-bit version of PostgreSQL runs perfectly fine on 64-bit Windows. And doing so may actually be a better idea in some cases than a native 64-bit version. But most importantly, the architecture of PostgreSQL makes the requirement for a native 64-bit version much less than for many other competing databases. Particularly:

  • PostgreSQL relies on the operating system to do much of the caching of data. Since the main limitation of running a 32-bit process is the total amount of memory being addressable, a system that is based on the database engine doing all the caching cannot address all the memory on a system with say 16Gb memory. For PostgreSQL, we leave much of that caching to the operating system which can address it, as long as the OS is 64-bit.

  • PostgreSQL uses a multi-process architecture, not multi-thread. I'm not going to get into the discussion about which is better, just summarize by saying there are good and bad points to both of them. In this case it's a good thing. In a multi-threaded database server, all client connections share the memory space, and are again limited by the total addressable memory region. With PostgreSQL, you could easily have more than 1Gb per backend (if necessary) without running out of memory, which significantly pushes away the point when you need to go to 64-bit.

So why would it ever be better to run a 32-bit version than a 64-bit version? Simply because a 64-bit version is more memory-hungry. Every pointer used in the system takes twice as much space. Every int takes twice as much space. And if you're not actually going be able to cash in on the benefits (by addressing loads of memory, for example), it's a very real cost.

I'm well aware that this is a fairly simplistic view on the 32- vs 64-bit situation in general, but it should be enough to explain why 32-bit PostgreSQL on 64-bit Windows isn't such a bad idea after all. And of course, this only applies if you are using an x64 version of Windows - with Itanium it's a whole different story.

Eight-point-three

Despite a couple of people pre-announcing it, PostgreSQL 8.3 was actually fully announced and released about 40 minutes ago. Announce email should be going out, and the press release is available. Instead of just reiterating the new features, I'll point you to the list of cool new stuff. Or check the release notes.

If you haven't tested your apps with it during beta or RC time, now is the perfect time to do it!

There, I just had to get this one in before Devrim did. Ha! I win this time!

Workaround for ident security issue

Some people, like Josh Berkus for example, claim that both trust and ident can cause the security issues with dblink. Which is true - they can. But unlike the blanket statements that they are insecure, there are ways around it.

Trust will always be insecure when dblink or similar functions are present. But there's a simple way to make ident not be insecure - just make sure that the postgres superuser does not have the same username as the OS user that runs the processes. In fact, make sure that there is no PostgreSQL user with the same username as the OS user.

initdb will make those userids the same by default, but you can change that using the -U parameter to initdb. In theory you can rename the superuser after the fact as well, but I haven't tried that so I don't know how safe it is...

Faking the dbo role

Long-time users of SQL Server will be familiar with the dbo user. This is a special user that's present in all databases, and represents the database owner - regardless of who that is. Another special thing about it is that all users of the server role System Administrators will be mapped to this user as well. A third special thing is that unlike all other objects, those owned by dbo don't need to be qualified with the owner (think schema) name before they are accessed.

Now, there are a lot of legacy applications out there that depend on this behavior. To make those work, you often have to deal with these three issues. There are obviously a lot of other things to deal with as well, but those are not on the topic of this post.

The first part is easy. Just GRANT the dbo role to the users as needed.

The third part is also easy - use the public schema instead. Make sure that the dbo role has permissions to write to this schema. This will work as long as the application doesn't schema-qualify the names. If it does, you'll need to create a schema called dbo and grant permissions on it to the dbo role.

I've previously had problems with the second part, until I tried something that's actually fairly obvious once you think about it. But I never did before, so here goes. When user joe logs in, even if you grant him CREATE permissions on the public schema, anything he creates will end up owned by joe and not dbo. To get around this, just run the following command for all users:

ALTER USER joe SET ROLE dbo;

That way, whenever joe logs in, the current role will be automatically switched to dbo, and all objects created will be properly owned by dbo.

New security updates are out

I've just approved the website updates for new security updates across all branches. Depending on how you use your database, these updates can be critical for updating or not at all. You should go look at the announcement right away to see if you need to upgrade! (if the link is a 404, you need to wait for the mirrors to sync up which should take less than an hour)

The same bugs are present in the 8.3 betas, and fixes will be included in 8.3RC1, due to be announced shortly.

Also, note that this it the EOL release for 7.3 - if you're still on 7.3, you need to upgrade [HTML_REMOVED]now[HTML_REMOVED].

Database or schema

A couple of weeks back I helped a guy out with a fairly large PostgreSQL installation, with several things. One of them prompted me to write this post. A fairly common question is "should I use databases or schemas to isolate my users". A fairly common answer is "schemas is the same as databases in other products", which isn't actually true (at least not for all of them). Schemas in PostgreSQL can be very flexible though - much more so than in say SQL Server, where a schema and a user is pretty much the same thing. But there are reasons for using databases, and reasons for using schemas. This was a textbook example on why to use schemas instead of databases.

The system they run has the database hosted in on their server, and a bunch of clients of different kinds (windows fat clients connecting directly and mobile devices connecting through an appserver). The system was set up with one database for each client, and all connecting using the postgres superuser account (oops. but that's a different thing).

An interesting point in this was that apart from the dedicated information for each client, there was a shared address table that had about 300Mb (IIRC) of data in it. This was 99.9%25 the same data between all clients, but each client had the ability to add extra rows to the table.

One of the things I ended up doing with them was to change the model to a single database shared by all clients, with one schema for each client. By giving each client their own account instead of using the superuser account, almost no changes were necessary in the client application - they would automatically see the tables in their own schema (since it's named the same as the account), but have no access to the data from other clients. There were several advantages gained by this:

  • Less overhead, since there were much fewer databases.
  • The ability to do cross-schema queries when needed, to perform cross-client statistical calculations (not by the clients themselves, but from the provider)
  • Less databases to remember to back up ;)

With that step done, the next step was very logical. We moved the address table to the public schema, and replaced it with a view in each of the client schemas. This view was a UNION ALL of the address table in the public schema, and a table called address_local that lived in each clients schema. This table was populated with all the entires the clients had added to their local copy. A couple of RULEs on the view would redirect modifications to the view to hit the local table (since the clients only had SELECT permissions on the shared table, and they really shouldn't modify that). Later on we changed the application to write directly to the address_local table, purely for performance reasons. This brought in several more advantages:

  • There is now only one copy of this large, shared table in the database. That greatly increases the probability of it staying in the cache, as well as reduces the risk that a scan on this table will push other tables out of the cache.
  • The address data is updated occasionally from an external source. Now it only has to be updated once, which will affect all clients
  • Backups are a lot smaller since the address data is only backed up once.

These are all very good arguments for using schemas instead of databases. But what are the downsides?

  • You can't use pg_hba.conf to restrict database access. In this case, that wasn't a reason, since they were all using the same account before. But it's a feature that will no longer work.
  • LISTEN/NOTIFY is per database. So the name of the NOTIFYs must change to include the name of the client
  • Restoring a single client can be somewhat more complex

I'm sure there are other arguments on both the positive and the negative side, but those are the ones we came across.

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

FOSDEM+PGDay 2019
Feb 1-3, 2019
Brussels, Belgium
Nordic PGDay 2019
Mar 19, 2019
Copenhagen, Denmark

Past

PGConf.Asia 2018
Dec 10-12, 2018
Tokyo, Japan
DC PostgreSQL Users Group
Nov 14, 2018
Washington DC, USA
New York City PostgreSQL User Group
Nov 13, 2018
New York City, NY, USA
Driving IT 2018
Nov 2, 2018
Copenhagen, Denmark
PGConf.EU 2018
Oct 23-26, 2018
Lisbon, Portugal
More past conferences