Viewing entries tagged with postgresql. Return to full view.

Replicating from MS SQL Server to PostgreSQL

Note: I'm sorry, all screenshots were lost in the planet crash. But I managed to recover the text, hopefully it helps someone.

There can be many reasons for wanting to replicate your data from a MS SQL Server installation to your PostgreSQL installation. For example, as a step in migration or to be able to use PostgreSQL features for data analysis while not having to touch existing clients working with MSSQL. For me in this case, I wanted to use tsearch2 to search some fulltext data, because the fulltext indexer in SQL Server really isn't very good.

It turns out that SQL Server ships with replication functionality that can solve this problem with relatively little pain (depending on your schema of course), providing full transactional replication. It's master/slave only, and SQL Server will be the master, but it's still quite useful. Here are the steps to do this for a simple example database - should work for more complex database as well of course. It expects you to set up a user named sqlrepl in the PostgreSQL database, that the replication system will use to connect with. Make sure that this user has permissions to connect from the SQL Server machine in pg_hba.conf.

  • Make sure you have the PostgreSQL ODBC drivers installed on the SQL Server machine (I'm using version 8.01.02).

  • Create the example databases:

  • In SQL Server:
            CREATE DATABASE origin
            go
            USE origin
            go
            CREATE TABLE tab1(
            id int identity not null primary key,
            t varchar(128) not null)
            INSERT INTO tab1 (t) VALUES ('Test 1')
            INSERT INTO tab1 (t) VALUES ('Test 2')
  • Then, in PostgreSQL:
            CREATE DATABASE slave OWNER sqlrepl;
            \connect slave
            CREATE TABLE tab1(id int not null primary key, t varchar(128) not null);
            ALTER TABLE tab1 OWNER TO sqlrepl;
  • Create a ODBC System Datasource on the SQL Server. This is done using the odbcad32 command. Make sure that you create a system datasource, and make sure you use the PostgreSQL ANSI driver (there are some problems with the UNICODE driver in the way SQL Server uses it)

  • Start SQL Server Enterprise Manager. Create a new linked server. This is done by right-clicking the Linked Servers node under Security and picking New Linked Server. Enter the name of the linked server (in all uppercase, in our case PGSLAVE), and pick the driver Microsoft OLE DB Provider for ODBC Drivers. Note that you should not pick the PostgreSQL ODBC driver here. Finally, enter the name of the ODBC datasource just created. Make sure the link works by clicking the Tables node and verify that you can see the tables of you database.

  • Configure the subscriber:

  • Right-click on the Replication node and select Configure publishing, subscribers, and distribution.
  • Select the tab Subscribers
  • Click New
  • Select OLE DB data source
  • Pick the linked server you created (PGSLAVE). Re-enter the login information.
  • Click OK and close all dialogs

  • Create the publication:

  • Right-click Publication under Replication and select New publication
  • Select your database, click Next
  • Select Transactional publication, click Next
  • Uncheck SQL Server 2000 and check Heterogeneous data sources, click Next
  • Click Article Defaults
  • Open the Snapshot tab
  • Change name conflicts to Keep existing table unchanged. In some cases it will work with drop and recreate, but I prefer creating the tables manually to make sure there is no mixup with datatypes and such (considering MSSQL doesn't really know about PostgreSQL datatypes)
  • Click OK
  • Put a checkbox on the tables to replicate (tab1 in this example). Click Next
  • You will get a warning about IDENTITY properties not being replicated to subscribers. This will happen if you have any IDENTITY columns in your table. In most cases, you can just ignore it. Click Next
  • Possibly modify description if you want to, click Next
  • Click next through the rest of the Wizard, and click Finish
  • Create the subscription:
  • Open properties for the publication
  • Open the Subscriptions tab
  • Click Push new
  • Select the created subscriber (PGSLAVE)
  • Click Next through the rest of the wizard, make sure to check the box for Start snapshot agent
  • Now sit back and watch your MSSQL data first being bulk-loaded into PostgreSQL, and then transactionally replicated (you will see a couple of seconds delay after a commit, same as when you replicate between two SQL Servers).

If you have more advanced needs (which you often do), you can use most other features of SQL Server replication, such as row and column filtering.

Npgsql and async notifications?

If this and this is what I think it is, that's the best news all week. I've really been waiting for this. (And no, I'm not talking about Franciscos surgery, if that's what you thought). I've still got a single perl program running in one project waiting for this one.

Need to get this blog on planetpostgresql.org so I get the info faster...

Linuxforum summary

Better late than never, almost a week since it was over, a quick summary of this years Linuxforum Denmark.

I have to say I'm very impressed. With an organisation of volounteers and at a very low cost compared to other conferences (such as OSCON which is definitly a lot higher budget), I think they managed to arrange a fantastic conference.

As a visitor, there were plenty of interesting talk, and this year the english language ones were nicely spread out so there was almost always one available. I don't really do Danish. Reading is fine, but listening to a technical talk - well, it's a bit too much of a challenge. I think the split of a business-oriented day (friday) and a tech day (saturday) is a smart thing to do. So I'll definitly recommend people who are around go go to this conference next year.

As a speaker the arrangement was very nice - not just "here's your room, be done", but all the things around it. I certainly felt very welcome, and I don't think there was a single point not being taken care of.

As for my own talk, I think it got to be a little bit too much in too little time - I should've cut a few more points out of it and been a bit more detailed about the ones left in. But hopefully it wasn't too bad.

So, big thanks to the organisers. And to all the interesting people I got to chat with - hosts and other speakers. A very interesting couple of days.

Oh, and yes, I got stopped in airport security. Not for the powerdrill they gave us (speakers) at the conference (very handy, btw!), but for the speakers badge. And yes, they were in the same bag...

At last - the password bug is found

There has been a very elusive bug around in the PostgreSQL client since 8.0 (at least) on win32. In some cases (until now unidentified) password authentication in psql stopped working, and just errored out with a message indicating the wrong password was used. And nobody has found the cause...

... until now, that is. Seems Robert Kinberg finally nailed it. At least, I firmly beleive that the bug he did identify is the one responsible for most of these reports.

Turns out that if you have a directory called \dev on the same drive you run psql from (normally c:, but that can differ), psql will attempt to use a file called /dev/tty. On Unix, this is common behaviour - we want to be sure we read the password from a tty. On windows, it just breaks - the first time, we will write the prompt into the file, and on later runs we will read back the prompt and use it as a password.

The fix is simple, just two lines changed and then two added #ifdefs. The fix is scheduled to go into 8.1.4 and 8.0.8. It's not actually approved and committed yet, but I see no reason why it won't be :-)

If you're having this problem today, check for dev directories in your root, and remove/rename/removepermissions that directory, and things should be back to normal.

So, big thanks to Robert for identifying this and suggesting the fix!

Monitoring PostgreSQL on Win32

A very common question from Win32 users is how to monitor Postgres the same way you can do with ps on Unix. There's a simple trick to it - don't use Task Manager, use Process Explorer from Sysinternals. So why is this. For comparision, here is a snapshot of my laptop with PostgreSQL running, and just a single client connection:

Task Manager screenshot

The same processes, as seen from Process Explorer:

Process Explorer screenshot

The most obvious difference - you can now see the process tree. You will see pg_ctl that is the service manager, that it started the postmaster, and that which subprocesses the postmaster started. You will also see the only second level postgres.exe, which is part of the stats system.

You will also notice in the Handle list, that there is an Event named pgident: .... This event contains the information you would find in the ps output on Unix and identifies the individual PostgreSQL process.

So - simply using a different tool gives you a lot more information. For 8.2 there will be some additional win32 management stuff, but more aobut that somewhere else.

Running as admin on win32 - take 2

My first patch missed out on the fact that while being able to run as admin is nice, it doesn't really help much if you can't initdb first. Because you need a database to run...

Extra patch now in the archives.

Linuxforum Copenhagen

Seems I completely forgot to post about this, but better late than never...

This year I'll be delivering a talk at Linuxforum Copenhagen. I'll be talking about some of the unique features in PostgreSQL, which means a lot about extensibility and such things. If you're around, come over and say hi!

(There's also plenty of other interesting talks there, and several speakers that are bound to be more interesting than me :-))

Running PostgreSQL as admin on Win32

Being able to run PostgreSQL from an accoun that is a local administrator is a much requested feature. There are times when this makes sense - say when you're test-running things from your own account, and you haven't yet taken the step to use a non-admin account. There are other cases when it doesn't, such as when you're creating a new account for the service anyway - there's just no point in making it an administrator then.

PostgreSQL on Unix has a check that prevents it from starting from the root account, and the same check exists on Windows in the form of checking for membership in Administrators or Power Users. This is a good thing, because it keeps the attack surface on the system way down. There are no plans to change this.

However, there is now a patch available (that will hopefully make it into 8.2, but we don't know that yet. It's been pending on -patches since jan 14th, but there are plenty of patches that have been waiting longer than that) that will let you start PostgreSQL from an adminstrative account. The code (which lives in pg_ctl) will then create and use what's known as a restricted token to launch the backend. This token will explicitly disable the Administrators and Power Users groups, as well as a whole lot of privileges that may be on by default. There is absolutely no way for anything executing in the database context (such as a stored procedure, or code injected through a security bug somewhere) to gain back these privileges. Since we take out the privileges as well, this actually makes the system slightly more secure than before.

The patch is available here, if you need it now. Should apply cleanly to 8.1 (and should work in 8.0, but may not apply cleanly). It only works on Windows 2000 and up, not NT4. But relly, you shouldn't be using NT4 any longer, for a lot of other reasons more important than this.

As a bonus, this patch wraps the database in a job object (when available, and will only work when running as a service) which makes it possible to do performance monitoring on the group of processes - such as total CPU used by all backends. Just use Performance Monitor, open up job objects, and it should list the PostgreSQL job object if it's running.

Hacking on pgAdmin

This weekend I finally took some time to dive into pgAdmin hacking. Previously, I've just nagged at Dave to get issues fixed (those few there were), but that doesn't work forever :-)

Turns out it was a lot easier than I thought. At least, that's what I think now. I've sent in several patches to the guys, and I guess I'll know if it was actually easy once I hear back from them. I might get them all back with a "yuck, this is ugly code" or something. But it does appear to at least work, which is a good first step.

In summary I have to say that it was really easy to understand the layout of the system in pgAdmin. Even though it has been many years since I did any serious C++ing, and I have never done any wx before...

The things I've worked on so far are:

  • Couple of tiny patches for building with VS2003 and for the unicode radiobuttons on the export screen
  • Ability to select all rows in the query output window, for copying, with a single keypress (Ctrl-A)
  • Ability to control quoting and separation when copying from the query results to the clipboard
  • Ability to auto-connect to a database and open a query window on startup using commandline parameters
  • Simple cut at autocompletion ("tab completion" / "intellisense" / whatever you might want to call it) for the SQL editor

Not bad for a first weekend, I think. Assuming, that is, it's actually accepted in the end...

Wow for contrib/intarray

In my ongoing quest to migrate some stuff from MSSQL to PostgreSQL, I've done my first implementation using contrib/intarray. And all I can say in this case is wow. It took some query times from 3+ seconds down to 0.4 milliseconds!! (Granted, that is on a faster machine as well - but not that much faster. And no longer dedicated. And the new machine actually has less memory than the old one).

Now, the old design was pretty darn bad. It basically had a varchar field that held a lot of numbers in the format "4,6,7,9,15". The objective was to find which rows contained say the number 9. The only way to do that in MSSQL is, AFAIK, to do a WHERE field LIKE '%259,%25', which is obviously dead slow.

For PostgreSQL, with the data stored in an intarray instead, the query is WHERE field @ '{9}'. You can even create a GIST index on it, for even faster lookups. And it's even more correct (the previous one will fail if the list has for example 19 in it. It had to be special-cased every time).

Finally, it makes it a lot easier to match for multiple values. Such as WHERE field @ '{9,14}'. For every step, the MSSQL solution just gets uglier and uglier, whereas PostgreSQL is nice and beautiful :-)

So. Arrays can be a bit of a hurdle to get into, and it will tie you down to PostgreSQL, but if you're doing something that deals with arrays or lists of integer, you should definitly check out contrib/intarray. Once again, a great job by Oleg and Teodor.

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

PGDay Chicago 2024
Apr 26, 2024
Chicago, USA
PGConf.DEV 2024
May 28-31, 2024
Vancouver, Canada

Past

SCaLE 2024
Mar 14-17, 2024
Pasadena, USA
Nordic PGDay 2024
Mar 12, 2024
Oslo, Norway
FOSDEM PGDay 2024
Feb 2-4, 2024
Brussels, Belgium
PGConf.EU 2023
Dec 12-15, 2023
Prague, Czechia
PGConf.NYC 2023
Oct 3-5, 2023
New York, USA
More past conferences