[GENERAL] Switching from MySQL: ON DUPLICATE KEY UPDATE, plpgsql function

by APseudoUtopiaon 2009-06-28T00:24:37+00:00
Hey list,
I'm migrating my site away from MySQL to PostgreSQL. So far, it's been
going great. However, there's one problem I've been having trouble
solving.
I have a query which allows users to "Catch up" on read posts on the
forum. It works by either updating or inserting the "last post read"
number from every forum thread into the readposts table (for that
userid and threadid combination, of course). Here's the table
structure:
CREATE TABLE "forums_readposts" (
"userid" INTEGER NOT NULL REFERENCES "users_main" ("id") ON DELETE CASCADE,
"threadid" INTEGER NOT NULL REFERENCES "forums_topics" ("id") ON
DELETE CASCADE,
"lastpostread" INTEGER NOT NULL CHECK ("lastpostread" >= 0),
PRIMARY KEY ("userid", "threadid")
);
Here's the original MySQL query that I have (db_string is a php
function that escapes the string):
INSERT INTO "forums_readposts" ("userid", "threadid", "lastpostread")
SELECT ' . db_string($_SESSION['UserInfo']['id']) . ', "id",
"lastpost" FROM "forums_topics" ON DUPLICATE KEY UPDATE "lastpostread"
= "lastpost";
Obviously this will not work with PostgreSQL. I've googled around a
bit and I decided to create a plpgsql function to handle the task. I
don't have much done, but here's what I have:
-----------------------------
CREATE FUNCTION FORUM_CATCH_UP_ALL (INTEGER) RETURNS VOID AS
$FuncTag$
BEGIN
LOOP
-- Try to update the record
-- This query is broken. I'm not sure how to do the subquery or
whatever I need to do. Maybe FROM? Another loop?
-- UPDATE "forums_readposts" SET "lastpostread" = (SELECT
"lastpost" FROM "forums_topics" WHERE blah blah
IF found THEN
RETURN;
END IF;
-- Not there, try to insert the key
-- If someone else inserts the same key concurrently,
-- We could get a unique-key failure
BEGIN
INSERT INTO "forums_readposts" ("userid", "threadid",
"lastpostread") (SELECT $1, "id", "lastpost" FROM "forums_topics")
WHERE "userid" = $1;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Do nothing, and loop to try the update again
END;
END LOOP;
END;
$FuncTag$
LANGUAGE plpgsql;
-----------------------------
I got the structure from the example in the postgresql documentation.
Hopefully it's a step in the right direction.
If anyone can point me in the direction to take another step in, I'd
really appreciate it.
Thanks.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general






APseudoUtopia wrote:

Hey list,
I have a query which allows users to "Catch up" on read posts on the
forum. It works by either updating or inserting the "last post read"
number from every forum thread into the readposts table (for that
userid and threadid combination, of course). Here's the table
structure:


Wouldn't a view be better than having a table that is deleted and
updated all the time.  I would add a field in the user table called 
last_login type timestamp  then do a select from the forums table to
generate this table where last_login <= FormTimeStamp .

I don't see the point having this table when a view would work better. 






Obviously this will not work with PostgreSQL. I've googled around a
bit and I decided to create a plpgsql function to handle the task. I
don't have much done, but here's what I have:
-----------------------------
CREATE FUNCTION FORUM_CATCH_UP_ALL (INTEGER) RETURNS VOID AS
$FuncTag$
BEGIN
LOOP
-- Try to update the record
-- This query is broken. I'm not sure how to do the subquery or
whatever I need to do. Maybe FROM? Another loop?
-- UPDATE "forums_readposts" SET "lastpostread" = (SELECT
"lastpost" FROM "forums_topics" WHERE blah blah
IF found THEN
RETURN;
END IF;
-- Not there, try to insert the key
-- If someone else inserts the same key concurrently




-- We could get a unique-key failure
BEGIN
INSERT INTO "forums_readposts" ("userid", "threadid",
"lastpostread") (SELECT $1, "id", "lastpost" FROM "forums_topics")
WHERE "userid" = $1;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Do nothing, and loop to try the update again
END;
END LOOP;
END;
$FuncTag$
LANGUAGE plpgsql;
-----------------------------



if you want to do something like this either do a test first to see if
the key is present in the table, update or do an insert like this
There is no reason to do a loop in the function waiting for a lock to
clear.   Postgresql Locks do not work like MySQL. 
CREATE FUNCTION FORUM_CATCH_UP_ALL (pUserID INTEGER) RETURNS VOID AS
$FuncTag$
BEGIN
                      select lastpostread  from forums_readposts  where
userid = pUserId;
                      if (  found() ) then
                         UPDATE forums_readposts SET "lastpostread" =
(SELECT lastpost FROM forums_topics WHERE blah blah) ; --its helpful to
post the entire function ;
                      else
                         INSERT INTO forums_readposts ( userid,
threadid,
                                    lastpostread) (SELECT $1, id,
lastpost FROM forums_topics)
                                    WHERE userid = pUserID;
                      end;
END;
$FuncTag$
LANGUAGE plpgsql;



I got the structure from the example in the postgresql documentation.
Hopefully it's a step in the right direction.
If anyone can point me in the direction to take another step in, I'd
really appreciate it.
Thanks.



Creating a view would work better and than creating a table to track
this.  I would think this website tracks the last time the user logged
in correct???  This is going to create allot of over head maintaining
this table when a simple select statement will work so much better if i
understand what you are doing.....





contempating installing a insert=2Cupdate trigger on forums_topics table so=
mething like
SET search_path =3D public=3B
CREATE TRIGGER "MyTableName_Trig"=20
AFTER INSERT OR DELETE OR UPDATE ON "forum_topics"
FOR EACH ROW EXECUTE PROCEDURE "FORUM_CATCH_UP_ALL" ()=3B
HTH
Martin=20
______________________________________________=20
Verzicht und Vertraulichkeitanmerkung/Note de d=E9ni et de confidentialit=
=E9
=20
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaeng=
er sein=2C so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiter=
leitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient l=
ediglich dem Austausch von Informationen und entfaltet keine rechtliche Bin=
dungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen w=
ir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut =EAtre privil=E9gi=E9. Si vous n'=EAtes=
pas le destinataire pr=E9vu=2C nous te demandons avec bont=E9 que pour sat=
isfaire informez l'exp=E9diteur. N'importe quelle diffusion non autoris=E9e=
ou la copie de ceci est interdite. Ce message sert =E0 l'information seule=
ment et n'aura pas n'importe quel effet l=E9galement obligatoire. =C9tant d=
onn=E9 que les email peuvent facilement =EAtre sujets =E0 la manipulation=
=2C nous ne pouvons accepter aucune responsabilit=E9 pour le contenu fourni=
.

Date: Sat=2C 27 Jun 2009 21:13:23 -0400
From: justin@emproshunts.com
To: apseudoutopia@gmail.com=3B pgsql-general@postgresql.org
Subject: Re: [GENERAL] Switching from MySQL: ON DUPLICATE KEY UPDATE=2C plp=
gsql function

=20
APseudoUtopia wrote:
Hey list=2C
I have a query which allows users to "Catch up" on read posts on the
forum. It works by either updating or inserting the "last post read"
number from every forum thread into the readposts table (for that
userid and threadid combination=2C of course). Here's the table
structure:
=20
Wouldn't a view be better than having a table that is deleted and
updated all the time. I would add a field in the user table called=20
last_login type timestamp then do a select from the forums table to
generate this table where last_login
On Sun, Jun 28, 2009 at 2:13 AM, justin wrote:
>
> if you want to do something like this either do a test first to see if the
> key is present in the table, update or do an insert like this
> There is no reason to do a loop in the function waiting for a lock to
> clear.=A0=A0 Postgresql Locks do not work like MySQL.
The latter is exactly what he posted, you *do* have to loop because
two sessions can try to do the update, find no records, and then both
try to insert and fail.
The problem is that the example posted is for a single update/insert.
The problem you're solving is for merging in a whole set of changes.
That's a bit more painful.
I think you're going to want something like;
UPDATE forums_readposts
SET lastpostread=3D(
select lastpost
from forums_topics
where id=3Dthreadid
)
WHERE userid=3D_id
INSERT INTO forums_readposts
(userid,threadid,lastpostread)
(select _userid, id, lastpost
from forums_topics
where id not in (
select threadid
from forum_readposts existing
where existing.userid=3D_userid
)
)
(you might want to experiment with that as an NOT EXISTS as there are
still cases where one is optimized better than the other due to the
standard's required null behaviour)
You have a few options here. You could just decide concurrency for
this operation really isn't important and use something to serialize
this operation. For example you could lock the user record with an
explicit select for update on the user record and the commit
immediately afterward.
Or you could catch the exception around the insert and assume if that
happened you don't have to bother retrying because the other
transaction you collided with is presumably doing the same thing. That
would break if a user hit "catch up" and simultaneously clicked on a
new thread he hadn't read before in another window.
Or you could do the same kind of loop around this, just always doing
the insert since it should insert 0 records if there are no missing
threads.
You could skip the insert entirely if the number of records updated in
matches the number of threads and you have that number handy. That
would be an especially good idea if you catch the exception around the
insert since exceptions are moderately expensive. They create a
subtransaction. Probably not a factor for an operation like this which
isn't dominating the workload.
--=20
greg
http://mit.edu/~gsstark/resume.pdf
--=20
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Sat, Jun 27, 2009 at 9:13 PM, justin wrote:
> APseudoUtopia wrote:
>
> Hey list,
>
> I have a query which allows users to "Catch up" on read posts on the
> forum. It works by either updating or inserting the "last post read"
> number from every forum thread into the readposts table (for that
> userid and threadid combination, of course). Here's the table
> structure:
>
>
> Wouldn't a view be better than having a table that is deleted and updated
> all the time.=C2=A0 I would add a field in the user table called=C2=A0 la=
st_login type
> timestamp=C2=A0 then do a select from the forums table to generate this t=
able
> where last_login
> I don't see the point having this table when a view would work better.
>
>
--snip--
>
> Creating a view would work better and than creating a table to track this.
> I would think this website tracks the last time the user logged in
> correct???=C2=A0 This is going to create allot of over head maintaining t=
his
> table when a simple select statement will work so much better if i
> understand what you are doing.....
>
>
>
That wouldn't work. What if a user logs in, reads only one forum
thread, then logs out (intending to read all the other forum threads
at some point in the future when they log in again). If I used a VIEW,
it would automatically consider all those unread forum posts to be
read when the user logs out.
--=20
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general








APseudoUtopia wrote:

thread, then logs out (intending to read all the other forum threads
at some point in the future when they log in again). If I used a VIEW,
it would automatically consider all those unread forum posts to be
read when the user logs out.

That wouldn't work. What if a user logs in, reads only one forum


You are keeping a list of all the forums a user has read,  i would not
worry about making sure the table tracking user activity has duplicate
key values. The select can be limited to return just on row with the
highest time stamp then compare this result to figure out what forms
the user has not read yet.  This eliminates one of problems but creates
a problem where table tracking user activity is going bloat but in low
traffic times delete the duplicate values.

A similar topic was discussed  on the performance  mailing list, where
updates are hung for several seconds for a similar tracking table...
http://archives.postgresql.org/pgsql-performance/2009-06/msg00300.php

 



To migrate the site, you can use an open source ETL tool.
Talend Open Studio is an open source ETL tool for data integration and
migration experts. It's easy to learn for a non-technical user. What
distinguishes Talend, when it comes to business users, is the tMap
component. It allows the user to get a graphical and functional view of
integration processes.=20
For more information: http://www.talend.com/
Justin-95 wrote:
>=20
>=20
>=20
>=20
>=20=20=20
>=20
>=20
>=20
>=20
> APseudoUtopia wrote:
>=20
> thread, then logs out (intending to read all the other forum threads
> at some point in the future when they log in again). If I used a VIEW,
> it would automatically consider all those unread forum posts to be
> read when the user logs out.
>=20
>=20=20=20
> That wouldn't work. What if a user logs in, reads only one forum=20
>=20
>=20
> You are keeping a list of all the forums a user has read,=C2=A0 i would n=
ot
> worry about making sure the table tracking user activity has duplicate
> key values. The select can be limited to return just on row with the
> highest time stamp then compare this result to figure out what forms
> the user has not read yet.=C2=A0 This eliminates one of problems but crea=
tes
> a problem where table tracking user activity is going bloat but in low
> traffic times delete the duplicate values.=20
>=20
> A similar topic was discussed=C2=A0 on the performance=C2=A0 mailing list=
, where
> updates are hung for several seconds for a similar tracking table...=20
> http://archives.postgresql.org/pgsql-performance/2009-06/msg00300.php=20
>=20
> =C2=A0=20
>=20
>=20
>=20
>=20
--=20
View this message in context: http://www.nabble.com/Switching-from-MySQL%3A=
-ON-DUPLICATE-KEY-UPDATE%2C-plpgsql-function-tp24237803p24254206.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--=20
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Sat, Jun 27, 2009 at 08:23:26PM -0400, APseudoUtopia wrote:
- Hey list,
-
- I'm migrating my site away from MySQL to PostgreSQL. So far, it's been
- going great. However, there's one problem I've been having trouble
- solving.
-
- I have a query which allows users to "Catch up" on read posts on the
- forum. It works by either updating or inserting the "last post read"
- number from every forum thread into the readposts table (for that
- userid and threadid combination, of course). Here's the table
- structure:
-
- CREATE TABLE "forums_readposts" (
- "userid" INTEGER NOT NULL REFERENCES "users_main" ("id") ON DELETE CASCADE,
- "threadid" INTEGER NOT NULL REFERENCES "forums_topics" ("id") ON
- DELETE CASCADE,
- "lastpostread" INTEGER NOT NULL CHECK ("lastpostread" >= 0),
- PRIMARY KEY ("userid", "threadid")
- );
-
- Here's the original MySQL query that I have (db_string is a php
- function that escapes the string):
-
- INSERT INTO "forums_readposts" ("userid", "threadid", "lastpostread")
- SELECT ' . db_string($_SESSION['UserInfo']['id']) . ', "id",
- "lastpost" FROM "forums_topics" ON DUPLICATE KEY UPDATE "lastpostread"
- = "lastpost";
So regardless of other design issues. (i.e., assuming what you have was working
in MySQL).
Wouldn't you just be looking for something like:
BEGIN;
EXECUTE 'insert into forums_readposts values ('...')';
EXCEPTION when unique_violation THEN
EXECUTE 'update forums_readposts set lastpostread = '...' ';
END;
The logic as i read your post is. If the user's never done a "catchup" operation
before, this will create the record. If he has, then it will update this record
to reflect the new transid.
Dave
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Jun 29, 2009 at 2:26 PM, David Kerr wrote:
> On Sat, Jun 27, 2009 at 08:23:26PM -0400, APseudoUtopia wrote:
> - Hey list,
> -
> - I'm migrating my site away from MySQL to PostgreSQL. So far, it's been
> - going great. However, there's one problem I've been having trouble
> - solving.
> -
> - I have a query which allows users to "Catch up" on read posts on the
> - forum. It works by either updating or inserting the "last post read"
> - number from every forum thread into the readposts table (for that
> - userid and threadid combination, of course). Here's the table
> - structure:
> -
> - CREATE TABLE "forums_readposts" (
> - =C2=A0"userid" =C2=A0 =C2=A0INTEGER NOT NULL REFERENCES "users_main" ("=
id") ON DELETE CASCADE,
> - =C2=A0"threadid" =C2=A0 =C2=A0INTEGER NOT NULL REFERENCES "forums_topic=
s" ("id") ON
> - DELETE CASCADE,
> - =C2=A0"lastpostread" =C2=A0 INTEGER NOT NULL CHECK ("lastpostread" >=3D=
0),
> - =C2=A0PRIMARY KEY ("userid", "threadid")
> - );
> -
> - Here's the original MySQL query that I have (db_string is a php
> - function that escapes the string):
> -
> - INSERT INTO "forums_readposts" ("userid", "threadid", "lastpostread")
> - SELECT ' . db_string($_SESSION['UserInfo']['id']) . ', "id",
> - "lastpost" FROM "forums_topics" ON DUPLICATE KEY UPDATE "lastpostread"
> - =3D "lastpost";
>
> So regardless of other design issues. (i.e., assuming what you have was w=
orking
> in MySQL).
>
> Wouldn't you just be looking for something like:
>
> BEGIN;
> =C2=A0EXECUTE 'insert into forums_readposts values ('...')';
> =C2=A0 =C2=A0 =C2=A0 =C2=A0EXCEPTION when unique_violation THEN
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0EXECUTE 'update fo=
rums_readposts set lastpostread =3D '...' ';
> END;
>
> The logic as i read your post is. If the user's never done a "catchup" op=
eration
> before, this will create the record. If he has, then it will update this =
record
> to reflect the new transid.
>
> Dave
>
Hmm, yeah. I'm new to PostgreSQL, and have little experience with
MySQL other than basic queries and joins. I didn't get to the part in
the docs that covers EXECUTE yet, haha. That looks like it would do
exactly what I want it to.
I appreciate it.
--=20
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Related Lists