- Previous thread: Extracting text using Beautifulsoup
- Next thread: virtualenv under Win7: easy_install fails in virtual environments
- Threads sorted by date: python 200910
Hello there,
The last couple of days I've been working on an experimental rewrite of
the GIL. Since the work has been turning out rather successful (or, at
least, not totally useless and crashing!) I thought I'd announce it
here.
First I want to stress this is not about removing the GIL. There still
is a Global Interpreter Lock which serializes access to most parts of
the interpreter. These protected parts haven't changed either, so Python
doesn't become really better at extracting computational parallelism out
of several cores.
Goals
-----
The new GIL (which is also the name of the sandbox area I've committed
it in, "newgil") addresses the following issues :
1) Switching by opcode counting. Counting opcodes is a very crude way of
estimating times, since the time spent executing a single opcode can
very wildly. Litterally, an opcode can be as short as a handful of
nanoseconds (think something like "... is not None") or as long as a
fraction of second, or even longer (think calling a heavy non-GIL
releasing C function, such as re.search()). Therefore, releasing the GIL
every 100 opcodes, regardless of their length, is a very poor policy.
The new GIL does away with this by ditching _Py_Ticker entirely and
instead using a fixed interval (by default 5 milliseconds, but settable)
after which we ask the main thread to release the GIL and let another
thread be scheduled.
2) GIL overhead and efficiency in contended situations. Apparently, some
OSes (OS X mainly) have problems with lock performance when the lock is
already taken: the system calls are heavy. This is the "Dave Beazley
effect", where he took a very trivial loop, therefore made of very short
opcodes and therefore releasing the GIL very often (probably 100000
times a second), and runs it in one or two threads on an OS with poor
lock performance (OS X). He sees a 50% increase in runtime when using
two threads rather than one, in what is admittedly a pathological case.
Even on better platforms such as Linux, eliminating the overhead of many
GIL acquires and releases (since the new GIL is released on a fixed time
basis rather than on an opcode counting basis) yields slightly better
performance (read: a smaller performance degradation :-)) when there are
several pure Python computation threads running.
3) Thread switching latency. The traditional scheme merely releases the
GIL for a couple of CPU cycles, and reacquires it immediately.
Unfortunately, this doesn't mean the OS will automatically switch to
another, GIL-awaiting thread. In many situations, the same thread will
continue running. This, with the opcode counting scheme, is the reason
why some people have been complaining about latency problems when an I/O
thread competes with a computational thread (the I/O thread wouldn't be
scheduled right away when e.g. a packet arrives; or rather, it would be
scheduled by the OS, but unscheduled immediately when trying to acquire
the GIL, and it would be scheduled again only much later).
The new GIL improves on this by combinating two mechanisms:
- forced thread switching, which means that when the switching interval
is terminated (mentioned in 1) and the GIL is released, we will force
any of the threads waiting on the GIL to be scheduled instead of the
formerly GIL-holding thread. Which thread exactly is an OS decision,
however: the goal here is not to have our own scheduler (this could be
discussed but I wanted the design to remain simple :-) After all,
man-years of work have been invested in scheduling algorithms by kernel
programming teams).
- priority requests, which is an option for a thread requesting the GIL
to be scheduled as soon as possible, and forcibly (rather than any other
threads). This is meant to be used by GIL-releasing methods such as
read() on files and sockets. The scheme, again, is very simple: when a
priority request is done by a thread, the GIL is released as soon as
possible by the thread holding it (including in the eval loop), and then
the thread making the priority request is forcibly scheduled (by making
all other GIL-awaiting threads wait in the meantime).
Implementation
--------------
The new GIL is implemented using a couple of mutexes and condition
variables. A {mutex, condition} pair is used to protect the GIL itself,
which is a mere variable named `gil_locked` (there are a couple of other
variables for bookkeeping). Another {mutex, condition} pair is used for
forced thread switching (described above). Finally, a separate mutex is
used for priority requests (described above).
The code is in the sandbox:
http://svn.python.org/view/sandbox/trunk/newgil/
The file of interest is Python/ceval_gil.h. Changes in other files are
very minimal, except for priority requests which have been added at
strategic places (some methods of I/O modules). Also, the code remains
rather short, while of course being less trivial than the old one.
NB : this is a branch of py3k. There should be no real difficulty
porting it back to trunk, provided someone wants to do the job.
Platforms
---------
I've implemented the new GIL for POSIX and Windows (tested under Linux
and Windows XP (running in a VM)). Judging by what I can read in the
online MSDN docs, the Windows support should include everything from
Windows 2000, and probably recent versions of Windows CE.
Other platforms aren't implemented, because I don't have access to the
necessary hardware. Besides, I must admit I'm not very motivated in
working on niche/obsolete systems. I've e-mailed Andrew MacIntyre in
private to ask him if he'd like to do the OS/2 support.
Supporting a new platform is not very difficult: it's a matter of
writing the 50-or-so lines of necessary platform-specific macros at the
beginning of Python/ceval_gil.h.
The reason I couldn't use the existing thread support
(Python/thread_*.h) is that these abstractions are too poor. Mainly,
they don't provide:
- events, conditions or an equivalent thereof
- the ability to acquire a resource with a timeout
Measurements
------------
Before starting this work, I wrote ccbench (*), a little benchmark
script ("ccbench" being a shorthand for "concurrency benchmark") which
measures two things:
- computation throughput with one or several concurrent threads
- latency to external events (I use an UDP socket) when there is zero,
one, or several background computation threads running
(*) http://svn.python.org/view/sandbox/trunk/ccbench/
The benchmark involves several computation workloads with different GIL
characteristics. By default there are 3 of them:
A- one pure Python workload (computation of a number of digits of pi):
that is, something which spends its time in the eval loop
B- one mostly C workload where the C implementation doesn't release the
GIL (regular expression matching)
C- one mostly C workload where the implementation does release the GIL
(bz2 compression)
In the ccbench directory you will find benchmark results, under Linux,
for two different systems I have here. The new GIL shows roughly similar
but slightly better throughput results than the old one. And it is much
better in the latency tests, especially in workload B (going down from
almost a second of average latency with the old GIL, to a couple of
milliseconds with the new GIL). This is the combined result of using a
time-based scheme (rather than opcode-based) and of forced thread
switching (rather than relying on the OS to actually switch threads when
we speculatively release the GIL).
As a sidenote, I might mention that single-threaded performance is not
degraded at all. It is, actually, theoretically a bit better because the
old ticker check in the eval loop becomes simpler; however, this goes
mostly unnoticed.
Now what remains to be done?
Having other people test it would be fine. Even better if you have an
actual multi-threaded py3k application. But ccbench results for other
OSes would be nice too :-)
(I get good results under the Windows XP VM but I feel that a VM is not
an ideal setup for a concurrency benchmark)
Of course, studying and reviewing the code is welcome. As for
integrating it into the mainline py3k branch, I guess we have to answer
these questions:
- is the approach interesting? (we could decide that it's just not worth
it, and that a good GIL can only be a dead (removed) GIL)
- is the patch good, mature and debugged enough?
- how do we deal with the unsupported platforms (POSIX and Windows
support should cover most bases, but the fate of OS/2 support depends on
Andrew)?
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
--000e0cd2a10ab741c30476c988bf
Content-Type: text/plain; charset=UTF-8
>
> [SNIP - a lot of detail on what sounds like a good design]
>
> Now what remains to be done?
>
> Having other people test it would be fine. Even better if you have an
> actual multi-threaded py3k application. But ccbench results for other
> OSes would be nice too :-)
> (I get good results under the Windows XP VM but I feel that a VM is not
> an ideal setup for a concurrency benchmark)
>
> Of course, studying and reviewing the code is welcome. As for
> integrating it into the mainline py3k branch, I guess we have to answer
> these questions:
> - is the approach interesting? (we could decide that it's just not worth
> it, and that a good GIL can only be a dead (removed) GIL)
>
I think it's worth it. Removal of the GIL is a totally open-ended problem
with no solution in sight. This, on the other hand, is a performance benefit
now. I say move forward with this. If it happens to be short-lived because
some actually figures out how to remove the GIL then great, but is that
really going to happen between now and Python 3.2? I doubt it.
> - is the patch good, mature and debugged enough?
> - how do we deal with the unsupported platforms (POSIX and Windows
> support should cover most bases, but the fate of OS/2 support depends on
> Andrew)?
>
>
It's up to Andrew to get the support in. While I have faith he will, this is
why we have been scaling back the support for alternative OSs for a while
and will continue to do so. I suspect the day Andrew stops keeping up will
be the day we push to have OS/2 be externally maintained.
-Brett
--000e0cd2a10ab741c30476c988bf
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
[SNIP - a lot of=
detail on what sounds like a good design]
Now what remains to be done?
Having other people test it would be fine. Even better if you have an
actual multi-threaded py3k application. But ccbench results for other
OSes would be nice too :-)
(I get good results under the Windows XP VM but I feel that a VM is not
an ideal setup for a concurrency benchmark)
Of course, studying and reviewing the code is welcome. As for
integrating it into the mainline py3k branch, I guess we have to answer
these questions:
- is the approach interesting? (we could decide that it's just not wort=
h
it, and that a good GIL can only be a dead (removed) GIL)I think it's worth it. Removal of the GIL is a total=
ly open-ended problem with no solution in sight. This, on the other hand, i=
s a performance benefit now. I say move forward with this. If it happens to=
be short-lived because some actually figures out how to remove the GIL the=
n great, but is that really going to happen between now and Python 3.2? I d=
oubt it.
=C2=A0
- is the patch good, mature and debugged enough?
- how do we deal with the unsupported platforms (POSIX and Windows
support should cover most bases, but the fate of OS/2 support depends on
Andrew)?
It's up to Andrew to get the suppo=
rt in. While I have faith he will, this is why we have been scaling back th=
e support for alternative OSs for a while and will continue to do so. I sus=
pect the day Andrew stops keeping up will be the day we push to have OS/2 b=
e externally maintained.
-Brett
--000e0cd2a10ab741c30476c988bf--
Content-Type: text/plain; charset=UTF-8
>
> [SNIP - a lot of detail on what sounds like a good design]
>
> Now what remains to be done?
>
> Having other people test it would be fine. Even better if you have an
> actual multi-threaded py3k application. But ccbench results for other
> OSes would be nice too :-)
> (I get good results under the Windows XP VM but I feel that a VM is not
> an ideal setup for a concurrency benchmark)
>
> Of course, studying and reviewing the code is welcome. As for
> integrating it into the mainline py3k branch, I guess we have to answer
> these questions:
> - is the approach interesting? (we could decide that it's just not worth
> it, and that a good GIL can only be a dead (removed) GIL)
>
I think it's worth it. Removal of the GIL is a totally open-ended problem
with no solution in sight. This, on the other hand, is a performance benefit
now. I say move forward with this. If it happens to be short-lived because
some actually figures out how to remove the GIL then great, but is that
really going to happen between now and Python 3.2? I doubt it.
> - is the patch good, mature and debugged enough?
> - how do we deal with the unsupported platforms (POSIX and Windows
> support should cover most bases, but the fate of OS/2 support depends on
> Andrew)?
>
>
It's up to Andrew to get the support in. While I have faith he will, this is
why we have been scaling back the support for alternative OSs for a while
and will continue to do so. I suspect the day Andrew stops keeping up will
be the day we push to have OS/2 be externally maintained.
-Brett
--000e0cd2a10ab741c30476c988bf
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
[SNIP - a lot of=
detail on what sounds like a good design]
Now what remains to be done?
Having other people test it would be fine. Even better if you have an
actual multi-threaded py3k application. But ccbench results for other
OSes would be nice too :-)
(I get good results under the Windows XP VM but I feel that a VM is not
an ideal setup for a concurrency benchmark)
Of course, studying and reviewing the code is welcome. As for
integrating it into the mainline py3k branch, I guess we have to answer
these questions:
- is the approach interesting? (we could decide that it's just not wort=
h
it, and that a good GIL can only be a dead (removed) GIL)I think it's worth it. Removal of the GIL is a total=
ly open-ended problem with no solution in sight. This, on the other hand, i=
s a performance benefit now. I say move forward with this. If it happens to=
be short-lived because some actually figures out how to remove the GIL the=
n great, but is that really going to happen between now and Python 3.2? I d=
oubt it.
=C2=A0
- is the patch good, mature and debugged enough?
- how do we deal with the unsupported platforms (POSIX and Windows
support should cover most bases, but the fate of OS/2 support depends on
Andrew)?
It's up to Andrew to get the suppo=
rt in. While I have faith he will, this is why we have been scaling back th=
e support for alternative OSs for a while and will continue to do so. I sus=
pect the day Andrew stops keeping up will be the day we push to have OS/2 b=
e externally maintained.
-Brett
--000e0cd2a10ab741c30476c988bf--
Antoine Pitrou wrote:
> Hello there,
>
> The last couple of days I've been working on an experimental rewrite of
> the GIL. Since the work has been turning out rather successful (or, at
> least, not totally useless and crashing!) I thought I'd announce it
> here.
I am curious as to whether the entire mechanism is or can be turned off
when not needed -- when there are not threads (other than the main,
starting thread)?
tjr
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> Hello there,
>
> The last couple of days I've been working on an experimental rewrite of
> the GIL. Since the work has been turning out rather successful (or, at
> least, not totally useless and crashing!) I thought I'd announce it
> here.
I am curious as to whether the entire mechanism is or can be turned off
when not needed -- when there are not threads (other than the main,
starting thread)?
tjr
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Terry Reedy writes:
>
> I am curious as to whether the entire mechanism is or can be turned off
> when not needed -- when there are not threads (other than the main,
> starting thread)?
It is an implicit feature: when no thread is waiting on the GIL, the GIL-holding
thread isn't notified and doesn't try to release it at all (in the eval loop,
that is; GIL-releasing C extensions still release it).
Note that "no thread is waiting on the GIL" can mean one of two things:
- either there is only one Python thread
- or the other Python threads are doing things with the GIL released (zlib/bz2
compression, waiting on I/O, sleep()ing, etc.)
So, yes, it automatically "turns itself off".
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>
> I am curious as to whether the entire mechanism is or can be turned off
> when not needed -- when there are not threads (other than the main,
> starting thread)?
It is an implicit feature: when no thread is waiting on the GIL, the GIL-holding
thread isn't notified and doesn't try to release it at all (in the eval loop,
that is; GIL-releasing C extensions still release it).
Note that "no thread is waiting on the GIL" can mean one of two things:
- either there is only one Python thread
- or the other Python threads are doing things with the GIL released (zlib/bz2
compression, waiting on I/O, sleep()ing, etc.)
So, yes, it automatically "turns itself off".
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Brett Cannon wrote:
> It's up to Andrew to get the support in. While I have faith he will,
> this is why we have been scaling back the support for alternative OSs
> for a while and will continue to do so. I suspect the day Andrew stops
> keeping up will be the day we push to have OS/2 be externally maintained.
Notwithstanding my desire to keep OS/2 supported in the Python tree,
keeping up has been more difficult of late:
- OS/2 is unquestionably a "legacy" environment, with system APIs
different in flavour and semantics from the current mainstream (though
surprisingly capable in many ways despite its age).
- The EMX runtime my OS/2 port currently relies on to abstract the
system API to a Posix-ish API is itself a legacy package, essentially
unmaintained for some years :-( This has been a source of increasing
pain as Python has moved with the mainstream... with regard to Unicode
support and threads in conjunction with multi-processing, in particular.
Real Life hasn't been favourably disposed either...
I have refrained from applying the extensive patches required to make
the port feature complete for 2.6 and later while I investigate an
alternate Posix emulating runtime (derived from FreeBSD's C library,
and which is used by Mozilla on OS/2), which would allow me to dispense
with most of these patches. But it has an issue or two of its own...
The cost in effort has been compounded by effectively having to try and
maintain two ports - 2.x and 3.x. And the 3.x port has suffered more
as its demands are higher.
So while I asked to keep the OS/2 thread support alive, if a decision
were to be taken to remove OS/2 support from the Python 3.x sources I
could live with that. A completed migration to Mercurial might well
make future port maintenance easier for me.
Regards,
Andrew.
--
-------------------------------------------------------------------------
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370
andymac@pcug.org.au (alt) | Belconnen ACT 2616
Web: http://www.andymac.org/ | Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> It's up to Andrew to get the support in. While I have faith he will,
> this is why we have been scaling back the support for alternative OSs
> for a while and will continue to do so. I suspect the day Andrew stops
> keeping up will be the day we push to have OS/2 be externally maintained.
Notwithstanding my desire to keep OS/2 supported in the Python tree,
keeping up has been more difficult of late:
- OS/2 is unquestionably a "legacy" environment, with system APIs
different in flavour and semantics from the current mainstream (though
surprisingly capable in many ways despite its age).
- The EMX runtime my OS/2 port currently relies on to abstract the
system API to a Posix-ish API is itself a legacy package, essentially
unmaintained for some years :-( This has been a source of increasing
pain as Python has moved with the mainstream... with regard to Unicode
support and threads in conjunction with multi-processing, in particular.
Real Life hasn't been favourably disposed either...
I have refrained from applying the extensive patches required to make
the port feature complete for 2.6 and later while I investigate an
alternate Posix emulating runtime (derived from FreeBSD's C library,
and which is used by Mozilla on OS/2), which would allow me to dispense
with most of these patches. But it has an issue or two of its own...
The cost in effort has been compounded by effectively having to try and
maintain two ports - 2.x and 3.x. And the 3.x port has suffered more
as its demands are higher.
So while I asked to keep the OS/2 thread support alive, if a decision
were to be taken to remove OS/2 support from the Python 3.x sources I
could live with that. A completed migration to Mercurial might well
make future port maintenance easier for me.
Regards,
Andrew.
--
-------------------------------------------------------------------------
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370
andymac@pcug.org.au (alt) | Belconnen ACT 2616
Web: http://www.andymac.org/ | Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Antoine Pitrou skrev:
> - priority requests, which is an option for a thread requesting the GIL
> to be scheduled as soon as possible, and forcibly (rather than any other
> threads).
So Python threads become preemptive rather than cooperative? That would
be great. :-)
time.sleep should generate a priority request to re-acquire the GIL; and
so should all other blocking standard library functions with a time-out.
S.M.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> - priority requests, which is an option for a thread requesting the GIL
> to be scheduled as soon as possible, and forcibly (rather than any other
> threads).
So Python threads become preemptive rather than cooperative? That would
be great. :-)
time.sleep should generate a priority request to re-acquire the GIL; and
so should all other blocking standard library functions with a time-out.
S.M.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> -----Original Message-----
> From: python-dev-bounces+kristjan=ccpgames.com@python.org
> [mailto:python-dev-bounces+kristjan=ccpgames.com@python.org] On Behalf
> Of Sturla Molden
> time.sleep should generate a priority request to re-acquire the GIL;
> and
> so should all other blocking standard library functions with a time-
> out.
I don't agree. You have to be very careful with priority. time.sleep() does not promise to wake up in any timely manner, and neither do the timeout functions. Rather, the timeout is a way to prevent infinite wait.
In my experience (from stackless python) using priority wakeup for IO can result in very erratic scheduling when there is much IO going on, every IO trumping another. You should stick to round robin except for very special and carefully analysed cases.
K
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
On Oct 26, 2009, at 10:09 AM, Kristj=E1n Valur J=F3nsson wrote:
>
>
>> -----Original Message-----
>> From: python-dev-bounces+kristjan=3Dccpgames.com@python.org
>> [mailto:python-dev-bounces+kristjan=3Dccpgames.com@python.org] On =
>> Behalf
>> Of Sturla Molden
>> time.sleep should generate a priority request to re-acquire the GIL;
>> and
>> so should all other blocking standard library functions with a time-
>> out.
>
> I don't agree. You have to be very careful with priority. =
> time.sleep() does not promise to wake up in any timely manner, and =
> neither do the timeout functions. Rather, the timeout is a way to =
> prevent infinite wait.
>
> In my experience (from stackless python) using priority wakeup for =
> IO can result in very erratic scheduling when there is much IO going =
> on, every IO trumping another. You should stick to round robin =
> except for very special and carefully analysed cases.
All the IO tasks can also go in their own round robin so that CPU time =
is correctly shared among all waiting IO tasks.
IOW, to make sure that all IO tasks get a fair share *in relation to =
all other IO tasks*.
Tasks can be put into the IO round robin when they "pull the IO alarm" =
so to speak, so there's no need to decide before-hand which task goes =
in which round robin pool.
I'm not familiar with this particular code in Python, but I've used =
this in other systems for years to make sure that IO tasks don't =
starve the rest of the system and that the most "insistent" IO task =
doesn't starve all the others.
S
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs=
.net
Antoine Pitrou skrev:
> - priority requests, which is an option for a thread requesting the GIL
> to be scheduled as soon as possible, and forcibly (rather than any other
> threads). T
Should a priority request for the GIL take a priority number?
- If two threads make a priority requests for the GIL, the one with the
higher priority should get the GIL first.
- If a thread with a low priority make a priority request for the GIL,
it should not be allowed to "preempt" (take the GIL away from) a
higher-priority thread, in which case the priority request would be
ignored.
Related issue: Should Python threads have priorities? They are after all
real OS threads.
S.M.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> - priority requests, which is an option for a thread requesting the GIL
> to be scheduled as soon as possible, and forcibly (rather than any other
> threads). T
Should a priority request for the GIL take a priority number?
- If two threads make a priority requests for the GIL, the one with the
higher priority should get the GIL first.
- If a thread with a low priority make a priority request for the GIL,
it should not be allowed to "preempt" (take the GIL away from) a
higher-priority thread, in which case the priority request would be
ignored.
Related issue: Should Python threads have priorities? They are after all
real OS threads.
S.M.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Sturla Molden writes:
>
> Antoine Pitrou skrev:
> > - priority requests, which is an option for a thread requesting the GIL
> > to be scheduled as soon as possible, and forcibly (rather than any other
> > threads). T
> Should a priority request for the GIL take a priority number?
Er, I prefer to keep things simple. If you have lots of I/O you should probably
use an event loop rather than separate threads.
> Related issue: Should Python threads have priorities? They are after all
> real OS threads.
Well, precisely they are OS threads, and the OS already assigns them (static or
dynamic) priorities. No need to replicate this.
(to answer another notion expressed in another message, there's no "round-robin"
scheduling either)
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>
> Antoine Pitrou skrev:
> > - priority requests, which is an option for a thread requesting the GIL
> > to be scheduled as soon as possible, and forcibly (rather than any other
> > threads). T
> Should a priority request for the GIL take a priority number?
Er, I prefer to keep things simple. If you have lots of I/O you should probably
use an event loop rather than separate threads.
> Related issue: Should Python threads have priorities? They are after all
> real OS threads.
Well, precisely they are OS threads, and the OS already assigns them (static or
dynamic) priorities. No need to replicate this.
(to answer another notion expressed in another message, there's no "round-robin"
scheduling either)
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
--0016e648c8fc523c1f0476d8eb8e
Content-Type: text/plain; charset=UTF-8
On Mon, Oct 26, 2009 at 10:58 AM, Antoine Pitrou wrote:
> Er, I prefer to keep things simple. If you have lots of I/O you should
> probably
> use an event loop rather than separate threads.
>
On Windows, sometimes using a single-threaded event loop is sometimes
impossible. WaitForMultipleObjects(), which is the Windows equivalent to
select() or poll(), can handle a maximum of only 64 objects.
Do we really need priority requests at all? They seem counter to your
desire for simplicity and allowing the operating system's scheduler to do
its work.
That said, if a thread's time budget is merely paused during I/O rather than
reset, then a thread making frequent (but short) I/O requests cannot starve
the system.
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC
--0016e648c8fc523c1f0476d8eb8e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On Mon, Oct 26, 2009 at 10:58 AM, Antoine Pitrou=
<solipsis@pitr=
ou.net> wrote:
Er, I prefer to keep things simple. If you have lots of I/O you should prob=
ably
use an event loop rather than separate threads.On=
Windows, sometimes using a single-threaded event loop is sometimes impossi=
ble.=C2=A0 WaitForMultipleObjects(), which is the Windows equivalent to sel=
ect() or poll(), can handle a maximum of only 64 objects.
Do we really need priority requests at all?=C2=A0 They seem counter to =
your desire for simplicity and allowing the operating system's schedule=
r to do its work.That said, if a thread's time budget is merely=
paused during I/O rather than reset, then a thread making frequent (but sh=
ort) I/O requests cannot starve the system.
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprise=
s, LLC
--0016e648c8fc523c1f0476d8eb8e--
Content-Type: text/plain; charset=UTF-8
On Mon, Oct 26, 2009 at 10:58 AM, Antoine Pitrou wrote:
> Er, I prefer to keep things simple. If you have lots of I/O you should
> probably
> use an event loop rather than separate threads.
>
On Windows, sometimes using a single-threaded event loop is sometimes
impossible. WaitForMultipleObjects(), which is the Windows equivalent to
select() or poll(), can handle a maximum of only 64 objects.
Do we really need priority requests at all? They seem counter to your
desire for simplicity and allowing the operating system's scheduler to do
its work.
That said, if a thread's time budget is merely paused during I/O rather than
reset, then a thread making frequent (but short) I/O requests cannot starve
the system.
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC
--0016e648c8fc523c1f0476d8eb8e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On Mon, Oct 26, 2009 at 10:58 AM, Antoine Pitrou=
<solipsis@pitr=
ou.net> wrote:
Er, I prefer to keep things simple. If you have lots of I/O you should prob=
ably
use an event loop rather than separate threads.On=
Windows, sometimes using a single-threaded event loop is sometimes impossi=
ble.=C2=A0 WaitForMultipleObjects(), which is the Windows equivalent to sel=
ect() or poll(), can handle a maximum of only 64 objects.
Do we really need priority requests at all?=C2=A0 They seem counter to =
your desire for simplicity and allowing the operating system's schedule=
r to do its work.That said, if a thread's time budget is merely=
paused during I/O rather than reset, then a thread making frequent (but sh=
ort) I/O requests cannot starve the system.
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprise=
s, LLC
--0016e648c8fc523c1f0476d8eb8e--
On Sun, Oct 25, 2009 at 1:22 PM, Antoine Pitrou wrote:
> Having other people test it would be fine. Even better if you have an
> actual multi-threaded py3k application. But ccbench results for other
> OSes would be nice too :-)
My results for an 2.4 GHz Intel Core 2 Duo MacBook Pro (OS X 10.5.8):
Control (py3k @ r75723)
--- Throughput ---
Pi calculation (Python)
threads=1: 633 iterations/s.
threads=2: 468 ( 74 %)
threads=3: 443 ( 70 %)
threads=4: 442 ( 69 %)
regular expression (C)
threads=1: 281 iterations/s.
threads=2: 282 ( 100 %)
threads=3: 282 ( 100 %)
threads=4: 282 ( 100 %)
bz2 compression (C)
threads=1: 379 iterations/s.
threads=2: 735 ( 193 %)
threads=3: 733 ( 193 %)
threads=4: 724 ( 190 %)
--- Latency ---
Background CPU task: Pi calculation (Python)
CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 1 ms. (std dev: 1 ms.)
CPU threads=2: 1 ms. (std dev: 2 ms.)
CPU threads=3: 3 ms. (std dev: 6 ms.)
CPU threads=4: 2 ms. (std dev: 3 ms.)
Background CPU task: regular expression (C)
CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 975 ms. (std dev: 577 ms.)
CPU threads=2: 1035 ms. (std dev: 571 ms.)
CPU threads=3: 1098 ms. (std dev: 556 ms.)
CPU threads=4: 1195 ms. (std dev: 557 ms.)
Background CPU task: bz2 compression (C)
CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 2 ms.)
CPU threads=2: 4 ms. (std dev: 5 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 1 ms. (std dev: 4 ms.)
Experiment (newgil branch @ r75723)
--- Throughput ---
Pi calculation (Python)
threads=1: 651 iterations/s.
threads=2: 643 ( 98 %)
threads=3: 637 ( 97 %)
threads=4: 625 ( 95 %)
regular expression (C)
threads=1: 298 iterations/s.
threads=2: 296 ( 99 %)
threads=3: 288 ( 96 %)
threads=4: 287 ( 96 %)
bz2 compression (C)
threads=1: 378 iterations/s.
threads=2: 720 ( 190 %)
threads=3: 724 ( 191 %)
threads=4: 718 ( 189 %)
--- Latency ---
Background CPU task: Pi calculation (Python)
CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 1 ms.)
CPU threads=2: 0 ms. (std dev: 1 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 1 ms. (std dev: 5 ms.)
Background CPU task: regular expression (C)
CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 1 ms. (std dev: 0 ms.)
CPU threads=2: 2 ms. (std dev: 1 ms.)
CPU threads=3: 2 ms. (std dev: 2 ms.)
CPU threads=4: 2 ms. (std dev: 1 ms.)
Background CPU task: bz2 compression (C)
CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 0 ms.)
CPU threads=2: 2 ms. (std dev: 3 ms.)
CPU threads=3: 0 ms. (std dev: 1 ms.)
CPU threads=4: 0 ms. (std dev: 0 ms.)
I also ran this through Unladen Swallow's threading microbenchmark,
which is a straight copy of what David Beazley was experimenting with
(simply iterating over 1000000 ints in pure Python) [1].
"iterative_count" is doing the loops one after the other,
"threaded_count" is doing the loops in parallel using threads.
The results below are benchmarking py3k as the control, newgil as the
experiment. When it says "x% faster", that is a measure of newgil's
performance over py3k's.
With two threads:
iterative_count:
Min: 0.336573 -> 0.387782: 13.21% slower # I've run this
configuration multiple times and gotten the same slowdown.
Avg: 0.338473 -> 0.418559: 19.13% slower
Significant (t=-38.434785, a=0.95)
threaded_count:
Min: 0.529859 -> 0.397134: 33.42% faster
Avg: 0.581786 -> 0.429933: 35.32% faster
Significant (t=70.100445, a=0.95)
With four threads:
iterative_count:
Min: 0.766617 -> 0.734354: 4.39% faster
Avg: 0.771954 -> 0.751374: 2.74% faster
Significant (t=22.164103, a=0.95)
Stddev: 0.00262 -> 0.00891: 70.53% larger
threaded_count:
Min: 1.175750 -> 0.829181: 41.80% faster
Avg: 1.224157 -> 0.867506: 41.11% faster
Significant (t=161.715477, a=0.95)
Stddev: 0.01900 -> 0.01120: 69.65% smaller
With eight threads:
iterative_count:
Min: 1.527794 -> 1.447421: 5.55% faster
Avg: 1.536911 -> 1.479940: 3.85% faster
Significant (t=35.559595, a=0.95)
Stddev: 0.00394 -> 0.01553: 74.61% larger
threaded_count:
Min: 2.424553 -> 1.677180: 44.56% faster
Avg: 2.484922 -> 1.723093: 44.21% faster
Significant (t=184.766131, a=0.95)
Stddev: 0.02874 -> 0.02956: 2.78% larger
I'd be interested in multithreaded benchmarks with less-homogenous workloads.
Collin Winter
[1] - http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_threading.py
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> Having other people test it would be fine. Even better if you have an
> actual multi-threaded py3k application. But ccbench results for other
> OSes would be nice too :-)
My results for an 2.4 GHz Intel Core 2 Duo MacBook Pro (OS X 10.5.8):
Control (py3k @ r75723)
--- Throughput ---
Pi calculation (Python)
threads=1: 633 iterations/s.
threads=2: 468 ( 74 %)
threads=3: 443 ( 70 %)
threads=4: 442 ( 69 %)
regular expression (C)
threads=1: 281 iterations/s.
threads=2: 282 ( 100 %)
threads=3: 282 ( 100 %)
threads=4: 282 ( 100 %)
bz2 compression (C)
threads=1: 379 iterations/s.
threads=2: 735 ( 193 %)
threads=3: 733 ( 193 %)
threads=4: 724 ( 190 %)
--- Latency ---
Background CPU task: Pi calculation (Python)
CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 1 ms. (std dev: 1 ms.)
CPU threads=2: 1 ms. (std dev: 2 ms.)
CPU threads=3: 3 ms. (std dev: 6 ms.)
CPU threads=4: 2 ms. (std dev: 3 ms.)
Background CPU task: regular expression (C)
CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 975 ms. (std dev: 577 ms.)
CPU threads=2: 1035 ms. (std dev: 571 ms.)
CPU threads=3: 1098 ms. (std dev: 556 ms.)
CPU threads=4: 1195 ms. (std dev: 557 ms.)
Background CPU task: bz2 compression (C)
CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 2 ms.)
CPU threads=2: 4 ms. (std dev: 5 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 1 ms. (std dev: 4 ms.)
Experiment (newgil branch @ r75723)
--- Throughput ---
Pi calculation (Python)
threads=1: 651 iterations/s.
threads=2: 643 ( 98 %)
threads=3: 637 ( 97 %)
threads=4: 625 ( 95 %)
regular expression (C)
threads=1: 298 iterations/s.
threads=2: 296 ( 99 %)
threads=3: 288 ( 96 %)
threads=4: 287 ( 96 %)
bz2 compression (C)
threads=1: 378 iterations/s.
threads=2: 720 ( 190 %)
threads=3: 724 ( 191 %)
threads=4: 718 ( 189 %)
--- Latency ---
Background CPU task: Pi calculation (Python)
CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 1 ms.)
CPU threads=2: 0 ms. (std dev: 1 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 1 ms. (std dev: 5 ms.)
Background CPU task: regular expression (C)
CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 1 ms. (std dev: 0 ms.)
CPU threads=2: 2 ms. (std dev: 1 ms.)
CPU threads=3: 2 ms. (std dev: 2 ms.)
CPU threads=4: 2 ms. (std dev: 1 ms.)
Background CPU task: bz2 compression (C)
CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 0 ms.)
CPU threads=2: 2 ms. (std dev: 3 ms.)
CPU threads=3: 0 ms. (std dev: 1 ms.)
CPU threads=4: 0 ms. (std dev: 0 ms.)
I also ran this through Unladen Swallow's threading microbenchmark,
which is a straight copy of what David Beazley was experimenting with
(simply iterating over 1000000 ints in pure Python) [1].
"iterative_count" is doing the loops one after the other,
"threaded_count" is doing the loops in parallel using threads.
The results below are benchmarking py3k as the control, newgil as the
experiment. When it says "x% faster", that is a measure of newgil's
performance over py3k's.
With two threads:
iterative_count:
Min: 0.336573 -> 0.387782: 13.21% slower # I've run this
configuration multiple times and gotten the same slowdown.
Avg: 0.338473 -> 0.418559: 19.13% slower
Significant (t=-38.434785, a=0.95)
threaded_count:
Min: 0.529859 -> 0.397134: 33.42% faster
Avg: 0.581786 -> 0.429933: 35.32% faster
Significant (t=70.100445, a=0.95)
With four threads:
iterative_count:
Min: 0.766617 -> 0.734354: 4.39% faster
Avg: 0.771954 -> 0.751374: 2.74% faster
Significant (t=22.164103, a=0.95)
Stddev: 0.00262 -> 0.00891: 70.53% larger
threaded_count:
Min: 1.175750 -> 0.829181: 41.80% faster
Avg: 1.224157 -> 0.867506: 41.11% faster
Significant (t=161.715477, a=0.95)
Stddev: 0.01900 -> 0.01120: 69.65% smaller
With eight threads:
iterative_count:
Min: 1.527794 -> 1.447421: 5.55% faster
Avg: 1.536911 -> 1.479940: 3.85% faster
Significant (t=35.559595, a=0.95)
Stddev: 0.00394 -> 0.01553: 74.61% larger
threaded_count:
Min: 2.424553 -> 1.677180: 44.56% faster
Avg: 2.484922 -> 1.723093: 44.21% faster
Significant (t=184.766131, a=0.95)
Stddev: 0.02874 -> 0.02956: 2.78% larger
I'd be interested in multithreaded benchmarks with less-homogenous workloads.
Collin Winter
[1] - http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_threading.py
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Collin Winter writes:
>
> My results for an 2.4 GHz Intel Core 2 Duo MacBook Pro (OS X 10.5.8):
Thanks!
[the Dave Beazley benchmark]
> The results below are benchmarking py3k as the control, newgil as the
> experiment. When it says "x% faster", that is a measure of newgil's
> performance over py3k's.
>
> With two threads:
>
> iterative_count:
> Min: 0.336573 -> 0.387782: 13.21% slower # I've run this
> configuration multiple times and gotten the same slowdown.
> Avg: 0.338473 -> 0.418559: 19.13% slower
Those numbers are not very in line with the other "iterative_count" results.
Since iterative_count just runs the loop N times in a row, results should be
proportional to the number N ("number of threads").
Besides, there's no reason for single-threaded performance to be degraded since
the fast path of the eval loop actually got a bit streamlined (there is no
volatile ticker to decrement).
> I'd be interested in multithreaded benchmarks with less-homogenous workloads.
So would I.
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>
> My results for an 2.4 GHz Intel Core 2 Duo MacBook Pro (OS X 10.5.8):
Thanks!
[the Dave Beazley benchmark]
> The results below are benchmarking py3k as the control, newgil as the
> experiment. When it says "x% faster", that is a measure of newgil's
> performance over py3k's.
>
> With two threads:
>
> iterative_count:
> Min: 0.336573 -> 0.387782: 13.21% slower # I've run this
> configuration multiple times and gotten the same slowdown.
> Avg: 0.338473 -> 0.418559: 19.13% slower
Those numbers are not very in line with the other "iterative_count" results.
Since iterative_count just runs the loop N times in a row, results should be
proportional to the number N ("number of threads").
Besides, there's no reason for single-threaded performance to be degraded since
the fast path of the eval loop actually got a bit streamlined (there is no
volatile ticker to decrement).
> I'd be interested in multithreaded benchmarks with less-homogenous workloads.
So would I.
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
On Mon, Oct 26, 2009 at 2:43 PM, Antoine Pitrou wrote:
> Collin Winter writes:
> [the Dave Beazley benchmark]
>> The results below are benchmarking py3k as the control, newgil as the
>> experiment. When it says "x% faster", that is a measure of newgil's
>> performance over py3k's.
>>
>> With two threads:
>>
>> iterative_count:
>> Min: 0.336573 -> 0.387782: 13.21% slower =A0# I've run this
>> configuration multiple times and gotten the same slowdown.
>> Avg: 0.338473 -> 0.418559: 19.13% slower
>
> Those numbers are not very in line with the other "iterative_count" resul=
ts.
> Since iterative_count just runs the loop N times in a row, results should=
be
> proportional to the number N ("number of threads").
>
> Besides, there's no reason for single-threaded performance to be degraded=
since
> the fast path of the eval loop actually got a bit streamlined (there is no
> volatile ticker to decrement).
I agree those numbers are out of line with the others and make no
sense. I've run it with two threads several times and the results are
consistent on this machine. I'm digging into it a bit more.
Collin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs=
.net
> Collin Winter writes:
> [the Dave Beazley benchmark]
>> The results below are benchmarking py3k as the control, newgil as the
>> experiment. When it says "x% faster", that is a measure of newgil's
>> performance over py3k's.
>>
>> With two threads:
>>
>> iterative_count:
>> Min: 0.336573 -> 0.387782: 13.21% slower =A0# I've run this
>> configuration multiple times and gotten the same slowdown.
>> Avg: 0.338473 -> 0.418559: 19.13% slower
>
> Those numbers are not very in line with the other "iterative_count" resul=
ts.
> Since iterative_count just runs the loop N times in a row, results should=
be
> proportional to the number N ("number of threads").
>
> Besides, there's no reason for single-threaded performance to be degraded=
since
> the fast path of the eval loop actually got a bit streamlined (there is no
> volatile ticker to decrement).
I agree those numbers are out of line with the others and make no
sense. I've run it with two threads several times and the results are
consistent on this machine. I'm digging into it a bit more.
Collin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs=
.net
On 04:18 pm, daniel@stutzbachenterprises.com wrote:
>On Mon, Oct 26, 2009 at 10:58 AM, Antoine Pitrou
>wrote:
>>Er, I prefer to keep things simple. If you have lots of I/O you should
>>probably
>>use an event loop rather than separate threads.
>
>On Windows, sometimes using a single-threaded event loop is sometimes
>impossible. WaitForMultipleObjects(), which is the Windows equivalent
>to
>select() or poll(), can handle a maximum of only 64 objects.
This is only partially accurate. For one thing, WaitForMultipleObjects
calls are nestable. For another thing, Windows also has I/O completion
ports which are not limited to 64 event sources. The situation is
actually better than on a lot of POSIXes.
>Do we really need priority requests at all? They seem counter to your
>desire for simplicity and allowing the operating system's scheduler to
>do
>its work.
Despite what I said above, however, I would also take a default position
against adding any kind of more advanced scheduling system here. It
would, perhaps, make sense to expose the APIs for controlling the
platform scheduler, though.
Jean-Paul
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>On Mon, Oct 26, 2009 at 10:58 AM, Antoine Pitrou
>wrote:
>>Er, I prefer to keep things simple. If you have lots of I/O you should
>>probably
>>use an event loop rather than separate threads.
>
>On Windows, sometimes using a single-threaded event loop is sometimes
>impossible. WaitForMultipleObjects(), which is the Windows equivalent
>to
>select() or poll(), can handle a maximum of only 64 objects.
This is only partially accurate. For one thing, WaitForMultipleObjects
calls are nestable. For another thing, Windows also has I/O completion
ports which are not limited to 64 event sources. The situation is
actually better than on a lot of POSIXes.
>Do we really need priority requests at all? They seem counter to your
>desire for simplicity and allowing the operating system's scheduler to
>do
>its work.
Despite what I said above, however, I would also take a default position
against adding any kind of more advanced scheduling system here. It
would, perhaps, make sense to expose the APIs for controlling the
platform scheduler, though.
Jean-Paul
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Hello again,
Brett Cannon writes:
>
> I think it's worth it. Removal of the GIL is a totally open-ended problem
> with no solution in sight. This, on the other hand, is a performance benefit
> now. I say move forward with this. If it happens to be short-lived because
> some actually figures out how to remove the GIL then great, but is that
> really going to happen between now and Python 3.2? I doubt it.
Based on this whole discussion, I think I am going to merge the new GIL work
into the py3k branch, with priority requests disabled.
If you think this is premature or uncalled for, or if you just want to review
the changes before making a judgement, please voice up :)
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
On Sun, Nov 1, 2009 at 03:33, Antoine Pitrou wrote:
>
> Hello again,
>
> Brett Cannon writes:
>>
>> I think it's worth it. Removal of the GIL is a totally open-ended problem
>> with no solution in sight. This, on the other hand, is a performance benefit
>> now. I say move forward with this. If it happens to be short-lived because
>> some actually figures out how to remove the GIL then great, but is that
>> really going to happen between now and Python 3.2? I doubt it.
>
> Based on this whole discussion, I think I am going to merge the new GIL work
> into the py3k branch, with priority requests disabled.
This will be a nice Py3K carrot!
>
> If you think this is premature or uncalled for, or if you just want to review
> the changes before making a judgement, please voice up :)
I know I personally trust you to not mess it up, Antoine, but that
might also come from mental exhaustion and laziness. =)
-Brett
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>
> Hello again,
>
> Brett Cannon writes:
>>
>> I think it's worth it. Removal of the GIL is a totally open-ended problem
>> with no solution in sight. This, on the other hand, is a performance benefit
>> now. I say move forward with this. If it happens to be short-lived because
>> some actually figures out how to remove the GIL then great, but is that
>> really going to happen between now and Python 3.2? I doubt it.
>
> Based on this whole discussion, I think I am going to merge the new GIL work
> into the py3k branch, with priority requests disabled.
This will be a nice Py3K carrot!
>
> If you think this is premature or uncalled for, or if you just want to review
> the changes before making a judgement, please voice up :)
I know I personally trust you to not mess it up, Antoine, but that
might also come from mental exhaustion and laziness. =)
-Brett
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Antoine Pitrou wrote:
> Based on this whole discussion, I think I am going to merge the new GIL work
> into the py3k branch, with priority requests disabled.
>
> If you think this is premature or uncalled for, or if you just want to review
> the changes before making a judgement, please voice up :)
+1 from me. I trust you like Brett does.
How much work would it cost to make your patch optional at compile time?
For what it's worth we could compare your work on different machines and
on different platforms before it gets enabled by default. Can you
imagine scenarios where your implementation might be slower than the
current GIL implementation?
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> Based on this whole discussion, I think I am going to merge the new GIL work
> into the py3k branch, with priority requests disabled.
>
> If you think this is premature or uncalled for, or if you just want to review
> the changes before making a judgement, please voice up :)
+1 from me. I trust you like Brett does.
How much work would it cost to make your patch optional at compile time?
For what it's worth we could compare your work on different machines and
on different platforms before it gets enabled by default. Can you
imagine scenarios where your implementation might be slower than the
current GIL implementation?
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Christian Heimes writes:
>
> +1 from me. I trust you like Brett does.
>
> How much work would it cost to make your patch optional at compile time?
Quite a bit, because it changes the logic for processing asynchronous pending
calls (signals) and asynchronous exceptions in the eval loop. The #defines would
get quite convoluted, I think; I'd prefer not to do that.
> For what it's worth we could compare your work on different machines and
> on different platforms before it gets enabled by default. Can you
> imagine scenarios where your implementation might be slower than the
> current GIL implementation?
I don't really think so. The GIL is taken and released much more predictably
than it was before. The thing that might be worth checking is a workload with
many threads (say 50 or 100). Does anyone have that?
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>
> +1 from me. I trust you like Brett does.
>
> How much work would it cost to make your patch optional at compile time?
Quite a bit, because it changes the logic for processing asynchronous pending
calls (signals) and asynchronous exceptions in the eval loop. The #defines would
get quite convoluted, I think; I'd prefer not to do that.
> For what it's worth we could compare your work on different machines and
> on different platforms before it gets enabled by default. Can you
> imagine scenarios where your implementation might be slower than the
> current GIL implementation?
I don't really think so. The GIL is taken and released much more predictably
than it was before. The thing that might be worth checking is a workload with
many threads (say 50 or 100). Does anyone have that?
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Antoine Pitrou wrote:
> Christian Heimes writes:
>> +1 from me. I trust you like Brett does.
>>
>> How much work would it cost to make your patch optional at compile time?
>
> Quite a bit, because it changes the logic for processing asynchronous pending
> calls (signals) and asynchronous exceptions in the eval loop. The #defines would
> get quite convoluted, I think; I'd prefer not to do that.
Based on the new piece of information I totally agree.
> I don't really think so. The GIL is taken and released much more predictably
> than it was before. The thing that might be worth checking is a workload with
> many threads (say 50 or 100). Does anyone have that?
I don't have an application that works on Python 3 and uses that many
threads, sorry.
Christian
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> Christian Heimes writes:
>> +1 from me. I trust you like Brett does.
>>
>> How much work would it cost to make your patch optional at compile time?
>
> Quite a bit, because it changes the logic for processing asynchronous pending
> calls (signals) and asynchronous exceptions in the eval loop. The #defines would
> get quite convoluted, I think; I'd prefer not to do that.
Based on the new piece of information I totally agree.
> I don't really think so. The GIL is taken and released much more predictably
> than it was before. The thing that might be worth checking is a workload with
> many threads (say 50 or 100). Does anyone have that?
I don't have an application that works on Python 3 and uses that many
threads, sorry.
Christian
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
On Sun, Nov 1, 2009 at 3:33 AM, Antoine Pitrou wrote:
>
> Hello again,
>
> Brett Cannon writes:
>>
>> I think it's worth it. Removal of the GIL is a totally open-ended problem
>> with no solution in sight. This, on the other hand, is a performance benefit
>> now. I say move forward with this. If it happens to be short-lived because
>> some actually figures out how to remove the GIL then great, but is that
>> really going to happen between now and Python 3.2? I doubt it.
>
> Based on this whole discussion, I think I am going to merge the new GIL work
> into the py3k branch, with priority requests disabled.
>
> If you think this is premature or uncalled for, or if you just want to review
> the changes before making a judgement, please voice up :)
+1 Good idea. Thats the best way to make sure this work gets
anywhere. It can be iterated on from there if anyone has objections.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>
> Hello again,
>
> Brett Cannon writes:
>>
>> I think it's worth it. Removal of the GIL is a totally open-ended problem
>> with no solution in sight. This, on the other hand, is a performance benefit
>> now. I say move forward with this. If it happens to be short-lived because
>> some actually figures out how to remove the GIL then great, but is that
>> really going to happen between now and Python 3.2? I doubt it.
>
> Based on this whole discussion, I think I am going to merge the new GIL work
> into the py3k branch, with priority requests disabled.
>
> If you think this is premature or uncalled for, or if you just want to review
> the changes before making a judgement, please voice up :)
+1 Good idea. Thats the best way to make sure this work gets
anywhere. It can be iterated on from there if anyone has objections.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> The new GIL does away with this by ditching _Py_Ticker entirely and
> instead using a fixed interval (by default 5 milliseconds, but settable)
> after which we ask the main thread to release the GIL and let another
> thread be scheduled.
I've looked at this part of the implementation, and have a few comments.
a) why is gil_interval a double? Make it an int, counting in
microseconds.
b) notice that, on Windows, minimum wait resolution may be as large as
15ms (e.g. on XP, depending on the hardware). Not sure what this
means for WaitForMultipleObjects; most likely, if you ask for a 5ms
wait, it waits until the next clock tick. It would be bad if, on
some systems, a wait of 5ms would mean that it immediately returns.
c) is the gil_drop_request handling thread-safe? If your answer is
"yes", you should add a comment as to what the assumptions are of
this code (ISTM that multiple threads may simultaneously attempt
to set the drop request, overlapping with the holding thread actually
dropping the GIL). There is also the question whether it is
thread-safe to write into a "volatile int"; I keep forgetting the
answer.
Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> instead using a fixed interval (by default 5 milliseconds, but settable)
> after which we ask the main thread to release the GIL and let another
> thread be scheduled.
I've looked at this part of the implementation, and have a few comments.
a) why is gil_interval a double? Make it an int, counting in
microseconds.
b) notice that, on Windows, minimum wait resolution may be as large as
15ms (e.g. on XP, depending on the hardware). Not sure what this
means for WaitForMultipleObjects; most likely, if you ask for a 5ms
wait, it waits until the next clock tick. It would be bad if, on
some systems, a wait of 5ms would mean that it immediately returns.
c) is the gil_drop_request handling thread-safe? If your answer is
"yes", you should add a comment as to what the assumptions are of
this code (ISTM that multiple threads may simultaneously attempt
to set the drop request, overlapping with the holding thread actually
dropping the GIL). There is also the question whether it is
thread-safe to write into a "volatile int"; I keep forgetting the
answer.
Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Martin v. L=F6wis skrev:
> b) notice that, on Windows, minimum wait resolution may be as large as
> 15ms (e.g. on XP, depending on the hardware). Not sure what this
> means for WaitForMultipleObjects; most likely, if you ask for a 5ms
> wait, it waits until the next clock tick. It would be bad if, on
> some systems, a wait of 5ms would mean that it immediately returns.
> =
Which is why one should use multimedia timers with QPC on Windows.
To get a wait function with much better resolution than Windows' =
default, do this:
1. Set a high resolution with timeBeginPeriod.
2. Loop using a time-out of 0 for WaitForMultipleObjects and put a =
Sleep(0) in the loop not to burn the CPU. Call QPF to get a precise =
timing, and break the loop when the requested time-out has been reached.
3. When you are done, call timeBeginPeriod to turn the multimedia timer off.
This is how you create usleep() in Windows as well: Just loop on QPF and =
Sleep(0) after setting timeBeginPeriod(1).
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs=
.net
> b) notice that, on Windows, minimum wait resolution may be as large as
> 15ms (e.g. on XP, depending on the hardware). Not sure what this
> means for WaitForMultipleObjects; most likely, if you ask for a 5ms
> wait, it waits until the next clock tick. It would be bad if, on
> some systems, a wait of 5ms would mean that it immediately returns.
> =
Which is why one should use multimedia timers with QPC on Windows.
To get a wait function with much better resolution than Windows' =
default, do this:
1. Set a high resolution with timeBeginPeriod.
2. Loop using a time-out of 0 for WaitForMultipleObjects and put a =
Sleep(0) in the loop not to burn the CPU. Call QPF to get a precise =
timing, and break the loop when the requested time-out has been reached.
3. When you are done, call timeBeginPeriod to turn the multimedia timer off.
This is how you create usleep() in Windows as well: Just loop on QPF and =
Sleep(0) after setting timeBeginPeriod(1).
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs=
.net
Sturla Molden wrote:
> Martin v. L=F6wis skrev:
>> b) notice that, on Windows, minimum wait resolution may be as large as
>> 15ms (e.g. on XP, depending on the hardware). Not sure what this
>> means for WaitForMultipleObjects; most likely, if you ask for a 5ms
>> wait, it waits until the next clock tick. It would be bad if, on
>> some systems, a wait of 5ms would mean that it immediately returns.
>> =
> Which is why one should use multimedia timers with QPC on Windows.
Maybe you should study the code under discussion before making such
a proposal.
Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs=
.net
> Martin v. L=F6wis skrev:
>> b) notice that, on Windows, minimum wait resolution may be as large as
>> 15ms (e.g. on XP, depending on the hardware). Not sure what this
>> means for WaitForMultipleObjects; most likely, if you ask for a 5ms
>> wait, it waits until the next clock tick. It would be bad if, on
>> some systems, a wait of 5ms would mean that it immediately returns.
>> =
> Which is why one should use multimedia timers with QPC on Windows.
Maybe you should study the code under discussion before making such
a proposal.
Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs=
.net
>> c) is the gil_drop_request handling thread-safe? If your answer is
>> "yes", you should add a comment as to what the assumptions are of
>> this code (ISTM that multiple threads may simultaneously attempt
>> to set the drop request, overlapping with the holding thread actually
>> dropping the GIL).
>
> The gil_drop_request is volatile mainly out of precaution, but it's probably not
> necessary. It is only written (set/cleared) when holding the gil_mutex; however,
> it can be read while not holding that mutex. Exceptionally reading the "wrong"
> value would not have any adverse consequences, it would just decrease the
> switching quality at the said instant.
I think it then needs definitely to be volatile - otherwise, the
compiler may chose to cache it in a register (assuming enough registers
are available), instead of re-reading it from memory each time (which is
exactly what volatile then forces it to).
Even if it is read from memory, I still wonder what might happen on
systems that require explicit memory barriers to synchronize across
CPUs. What if CPU 1 keeps reading a 0 value out of its cache, even
though CPU 1 has written an 1 value a long time ago?
IIUC, any (most?) pthread calls would cause synchronization in that
case, which is why applications that also use locks for reading:
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_10
Of course, on x86, you won't see any issues, because it's cache-coherent
anyway.
Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>> "yes", you should add a comment as to what the assumptions are of
>> this code (ISTM that multiple threads may simultaneously attempt
>> to set the drop request, overlapping with the holding thread actually
>> dropping the GIL).
>
> The gil_drop_request is volatile mainly out of precaution, but it's probably not
> necessary. It is only written (set/cleared) when holding the gil_mutex; however,
> it can be read while not holding that mutex. Exceptionally reading the "wrong"
> value would not have any adverse consequences, it would just decrease the
> switching quality at the said instant.
I think it then needs definitely to be volatile - otherwise, the
compiler may chose to cache it in a register (assuming enough registers
are available), instead of re-reading it from memory each time (which is
exactly what volatile then forces it to).
Even if it is read from memory, I still wonder what might happen on
systems that require explicit memory barriers to synchronize across
CPUs. What if CPU 1 keeps reading a 0 value out of its cache, even
though CPU 1 has written an 1 value a long time ago?
IIUC, any (most?) pthread calls would cause synchronization in that
case, which is why applications that also use locks for reading:
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_10
Of course, on x86, you won't see any issues, because it's cache-coherent
anyway.
Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> - all machines Python runs on should AFAIK be cache-coherent: CPUs synchronize
> their views of memory in a rather timely fashion.
Ok. I thought that Itanium was an example where this assumption is
actually violated (as many web pages claim such a restriction), however,
it seems that on Itanium, caches are indeed synchronized using MESI.
So claims wrt. lack of cache consistency on Itanium, and the need for
barrier instruction, seem to be caused by the Itanium feature that
allows the processor to fetch memory out-of-order, i.e. an earlier read
may see a later memory state. This is apparently used to satisfy reads
as soon as the cache line is read (so that the cache line can be
discarded earlier). Wrt. to your requirement ("rather timely fashion",
this still seems to be fine).
Still, this all needs to be documented in the code :-)
Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> their views of memory in a rather timely fashion.
Ok. I thought that Itanium was an example where this assumption is
actually violated (as many web pages claim such a restriction), however,
it seems that on Itanium, caches are indeed synchronized using MESI.
So claims wrt. lack of cache consistency on Itanium, and the need for
barrier instruction, seem to be caused by the Itanium feature that
allows the processor to fetch memory out-of-order, i.e. an earlier read
may see a later memory state. This is apparently used to satisfy reads
as soon as the cache line is read (so that the cache line can be
discarded earlier). Wrt. to your requirement ("rather timely fashion",
this still seems to be fine).
Still, this all needs to be documented in the code :-)
Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Martin v. L=F6wis skrev:
> Maybe you should study the code under discussion before making such
> a proposal.
>
I did, and it does nothing of what I suggested. I am sure I can make the =
Windows GIL
in ceval_gil.h and the mutex in thread_nt.h at lot more precise and =
efficient.
This is the kind of code I was talking about, from ceval_gil.h:
r =3D WaitForMultipleObjects(2, objects, TRUE, milliseconds);
I would turn on multimedia timer (it is not on by default), and replace =
this
call with a loop, approximately like this:
for (;;) {
r =3D WaitForMultipleObjects(2, objects, TRUE, 0);
/* blah blah blah */ =
QueryPerformanceCounter(&cnt); =
if (cnt > timeout) break;
Sleep(0);
}
And the timeout "milliseconds" would now be computed from querying the =
performance
counter, instead of unreliably by the Windows NT kernel.
Sturla
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs=
.net
> Maybe you should study the code under discussion before making such
> a proposal.
>
I did, and it does nothing of what I suggested. I am sure I can make the =
Windows GIL
in ceval_gil.h and the mutex in thread_nt.h at lot more precise and =
efficient.
This is the kind of code I was talking about, from ceval_gil.h:
r =3D WaitForMultipleObjects(2, objects, TRUE, milliseconds);
I would turn on multimedia timer (it is not on by default), and replace =
this
call with a loop, approximately like this:
for (;;) {
r =3D WaitForMultipleObjects(2, objects, TRUE, 0);
/* blah blah blah */ =
QueryPerformanceCounter(&cnt); =
if (cnt > timeout) break;
Sleep(0);
}
And the timeout "milliseconds" would now be computed from querying the =
performance
counter, instead of unreliably by the Windows NT kernel.
Sturla
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs=
.net
Sturla Molden writes:
>
> And the timeout "milliseconds" would now be computed from querying the
> performance
> counter, instead of unreliably by the Windows NT kernel.
Could you actually test your proposal under Windows and report what kind of
concrete benefits it brings?
Thank you
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>
> And the timeout "milliseconds" would now be computed from querying the
> performance
> counter, instead of unreliably by the Windows NT kernel.
Could you actually test your proposal under Windows and report what kind of
concrete benefits it brings?
Thank you
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Sturla Molden skrev:
>
> I would turn on multimedia timer (it is not on by default), and
> replace this
> call with a loop, approximately like this:
>
> for (;;) {
> r = WaitForMultipleObjects(2, objects, TRUE, 0);
> /* blah blah blah */ QueryPerformanceCounter(&cnt); if (cnt >
> timeout) break;
> Sleep(0);
> }
And just so you don't ask: There should not just be a Sleep(0) in the
loop, but a sleep that gets shorter and shorter until a lower threshold
is reached, where it skips to Sleep(0). That way we avoid hammering om
WaitForMultipleObjects and QueryPerformanceCounter more than we need.
And for all that to work better than just giving a timeout to
WaitForMultipleObjects, we need the multimedia timer turned on.
Sturla
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>
> I would turn on multimedia timer (it is not on by default), and
> replace this
> call with a loop, approximately like this:
>
> for (;;) {
> r = WaitForMultipleObjects(2, objects, TRUE, 0);
> /* blah blah blah */ QueryPerformanceCounter(&cnt); if (cnt >
> timeout) break;
> Sleep(0);
> }
And just so you don't ask: There should not just be a Sleep(0) in the
loop, but a sleep that gets shorter and shorter until a lower threshold
is reached, where it skips to Sleep(0). That way we avoid hammering om
WaitForMultipleObjects and QueryPerformanceCounter more than we need.
And for all that to work better than just giving a timeout to
WaitForMultipleObjects, we need the multimedia timer turned on.
Sturla
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Sturla Molden skrev:
>
> And just so you don't ask: There should not just be a Sleep(0) in the =
> loop, but a sleep that gets shorter and shorter until a lower =
> threshold is reached, where it skips to Sleep(0). That way we avoid =
> hammering om WaitForMultipleObjects and QueryPerformanceCounter more =
> than we need. And for all that to work better than just giving a =
> timeout to WaitForMultipleObjects, we need the multimedia timer turned =
> on.
The important thing about multimedia timer is that the granularity of =
Sleep() and WaitForMultipleObjects() by default is "10 ms or at most 20 =
ms". But if we call
timeBeginPeriod(1);
the MM timer is on and granularity becomes 1 ms or at most 2 ms. But we =
can get even more precise than that by hammering on Sleep(0) for =
timeouts less than 2 ms. We can get typical granularity in the order of =
10 =B5s, with the occational 100 =B5s now and then. I know this because I =
was using Windows 2000 to generate TTL signals on the LPT port some =
years ago, and watched them on the oscilloscope.
~ 15 ms granularity is Windows default. But that is brain dead.
By the way Antoine, if you think granularity of 1-2 ms is sufficient, =
i.e. no need for =B5s precision, then just calling timeBeginPeriod(1) will =
be sufficient.
Sturla
=
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs=
.net
>
> And just so you don't ask: There should not just be a Sleep(0) in the =
> loop, but a sleep that gets shorter and shorter until a lower =
> threshold is reached, where it skips to Sleep(0). That way we avoid =
> hammering om WaitForMultipleObjects and QueryPerformanceCounter more =
> than we need. And for all that to work better than just giving a =
> timeout to WaitForMultipleObjects, we need the multimedia timer turned =
> on.
The important thing about multimedia timer is that the granularity of =
Sleep() and WaitForMultipleObjects() by default is "10 ms or at most 20 =
ms". But if we call
timeBeginPeriod(1);
the MM timer is on and granularity becomes 1 ms or at most 2 ms. But we =
can get even more precise than that by hammering on Sleep(0) for =
timeouts less than 2 ms. We can get typical granularity in the order of =
10 =B5s, with the occational 100 =B5s now and then. I know this because I =
was using Windows 2000 to generate TTL signals on the LPT port some =
years ago, and watched them on the oscilloscope.
~ 15 ms granularity is Windows default. But that is brain dead.
By the way Antoine, if you think granularity of 1-2 ms is sufficient, =
i.e. no need for =B5s precision, then just calling timeBeginPeriod(1) will =
be sufficient.
Sturla
=
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs=
.net
Sturla Molden writes:
>
> By the way Antoine, if you think granularity of 1-2 ms is sufficient,
It certainly is.
But once again, I'm no Windows developer and I don't have a native Windost host
to test on; therefore someone else (you?) has to try.
Also, the MSDN doc (*) says timeBeginPeriod() can have a detrimental effect on
system performance; I don't know how much of it is true.
(*) http://msdn.microsoft.com/en-us/library/dd757624(VS.85).aspx
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>
> By the way Antoine, if you think granularity of 1-2 ms is sufficient,
It certainly is.
But once again, I'm no Windows developer and I don't have a native Windost host
to test on; therefore someone else (you?) has to try.
Also, the MSDN doc (*) says timeBeginPeriod() can have a detrimental effect on
system performance; I don't know how much of it is true.
(*) http://msdn.microsoft.com/en-us/library/dd757624(VS.85).aspx
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Antoine Pitrou skrev:
> It certainly is.
> But once again, I'm no Windows developer and I don't have a native Windost host
> to test on; therefore someone else (you?) has to try.
>
I'd love to try, but I don't have VC++ to build Python, I use GCC on
Windows.
Anyway, the first thing to try then is to call
timeBeginPeriod(1);
once on startup, and leave the rest of the code as it is. If 2-4 ms is
sufficient we can use timeBeginPeriod(2), etc. Microsoft is claiming
Windows performs better with high granularity, which is why it is 10 ms
by default.
Sturla
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> It certainly is.
> But once again, I'm no Windows developer and I don't have a native Windost host
> to test on; therefore someone else (you?) has to try.
>
I'd love to try, but I don't have VC++ to build Python, I use GCC on
Windows.
Anyway, the first thing to try then is to call
timeBeginPeriod(1);
once on startup, and leave the rest of the code as it is. If 2-4 ms is
sufficient we can use timeBeginPeriod(2), etc. Microsoft is claiming
Windows performs better with high granularity, which is why it is 10 ms
by default.
Sturla
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Sturla Molden wrote:
> Antoine Pitrou skrev:
>> It certainly is.
>> But once again, I'm no Windows developer and I don't have a native
>> Windost host
>> to test on; therefore someone else (you?) has to try.
>>
> I'd love to try, but I don't have VC++ to build Python, I use GCC on
> Windows.
>
> Anyway, the first thing to try then is to call
>
> timeBeginPeriod(1);
>
> once on startup, and leave the rest of the code as it is. If 2-4 ms is
> sufficient we can use timeBeginPeriod(2), etc. Microsoft is claiming
> Windows performs better with high granularity, which is why it is 10 ms
> by default.
>
>
> Sturla
That page claims:
Windows uses the lowest value (that is, highest resolution) requested
by any process.
I would posit that the chance of having some random process on your
machine request a high-speed timer is high enough that the overhead for
Python doing the same is probably low.
John
=:->
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> Antoine Pitrou skrev:
>> It certainly is.
>> But once again, I'm no Windows developer and I don't have a native
>> Windost host
>> to test on; therefore someone else (you?) has to try.
>>
> I'd love to try, but I don't have VC++ to build Python, I use GCC on
> Windows.
>
> Anyway, the first thing to try then is to call
>
> timeBeginPeriod(1);
>
> once on startup, and leave the rest of the code as it is. If 2-4 ms is
> sufficient we can use timeBeginPeriod(2), etc. Microsoft is claiming
> Windows performs better with high granularity, which is why it is 10 ms
> by default.
>
>
> Sturla
That page claims:
Windows uses the lowest value (that is, highest resolution) requested
by any process.
I would posit that the chance of having some random process on your
machine request a high-speed timer is high enough that the overhead for
Python doing the same is probably low.
John
=:->
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Sturla Molden writes:
>
> I'd love to try, but I don't have VC++ to build Python, I use GCC on
> Windows.
You can use Visual Studio Express, which is free (gratis).
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>
> I'd love to try, but I don't have VC++ to build Python, I use GCC on
> Windows.
You can use Visual Studio Express, which is free (gratis).
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> I did, and it does nothing of what I suggested. I am sure I can make the
> Windows GIL in ceval_gil.h and the mutex in thread_nt.h at lot more precise and
> efficient.
Hmm. I'm skeptical that your code makes it more accurate, and I
completely fail to see that it makes it more efficient (by what
measurement of efficiency?)
Also, why would making it more accurate make it better? IIUC,
accuracy is completely irrelevant here, though efficiency
(low overhead) does matter.
> This is the kind of code I was talking about, from ceval_gil.h:
>
> r = WaitForMultipleObjects(2, objects, TRUE, milliseconds);
>
> I would turn on multimedia timer (it is not on by default), and replace
> this
> call with a loop, approximately like this:
>
> for (;;) {
> r = WaitForMultipleObjects(2, objects, TRUE, 0);
> /* blah blah blah */ QueryPerformanceCounter(&cnt); if (cnt >
> timeout) break;
> Sleep(0);
> }
>
> And the timeout "milliseconds" would now be computed from querying the
> performance counter, instead of unreliably by the Windows NT kernel.
Hmm. This creates a busy wait loop; if you add larger sleep values,
then it loses accuracy.
Why not just call timeBeginPeriod, and then rely on the higher clock
rate for WaitForMultipleObjects?
Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
> Windows GIL in ceval_gil.h and the mutex in thread_nt.h at lot more precise and
> efficient.
Hmm. I'm skeptical that your code makes it more accurate, and I
completely fail to see that it makes it more efficient (by what
measurement of efficiency?)
Also, why would making it more accurate make it better? IIUC,
accuracy is completely irrelevant here, though efficiency
(low overhead) does matter.
> This is the kind of code I was talking about, from ceval_gil.h:
>
> r = WaitForMultipleObjects(2, objects, TRUE, milliseconds);
>
> I would turn on multimedia timer (it is not on by default), and replace
> this
> call with a loop, approximately like this:
>
> for (;;) {
> r = WaitForMultipleObjects(2, objects, TRUE, 0);
> /* blah blah blah */ QueryPerformanceCounter(&cnt); if (cnt >
> timeout) break;
> Sleep(0);
> }
>
> And the timeout "milliseconds" would now be computed from querying the
> performance counter, instead of unreliably by the Windows NT kernel.
Hmm. This creates a busy wait loop; if you add larger sleep values,
then it loses accuracy.
Why not just call timeBeginPeriod, and then rely on the higher clock
rate for WaitForMultipleObjects?
Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
--0016e648f2c4a39309047766df68
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On Mon, Nov 2, 2009 at 11:27 AM, "Martin v. L=C3=B6wis" wrote:
> Hmm. This creates a busy wait loop; if you add larger sleep values,
> then it loses accuracy.
>
I thought that at first, too, but then I checked the documentation for
Sleep(0):
"A value of zero causes the thread to relinquish the remainder of its time
slice to any other thread of equal priority that is ready to run."
(this is not to say that I think the solution with Sleep is worthwhile,
though...)
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC
--0016e648f2c4a39309047766df68
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On Mon, Nov 2, 2009 at 11:27 AM, "Martin v.=
L=C3=B6wis" <martin@v.loewis.de> wrote:
Hmm. This creates a busy wait loop; if you add larger sleep values,
then it loses accuracy.I thought that at first, t=
oo, but then I checked the documentation for Sleep(0):"A value=
of zero causes the thread to relinquish the remainder of its
time slice to any other thread of equal priority that is ready to run."=
;(this is not to say that I think the solution with Sleep is worthw=
hile, though...)
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprise=
s, LLC
--0016e648f2c4a39309047766df68--
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On Mon, Nov 2, 2009 at 11:27 AM, "Martin v. L=C3=B6wis" wrote:
> Hmm. This creates a busy wait loop; if you add larger sleep values,
> then it loses accuracy.
>
I thought that at first, too, but then I checked the documentation for
Sleep(0):
"A value of zero causes the thread to relinquish the remainder of its time
slice to any other thread of equal priority that is ready to run."
(this is not to say that I think the solution with Sleep is worthwhile,
though...)
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC
--0016e648f2c4a39309047766df68
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On Mon, Nov 2, 2009 at 11:27 AM, "Martin v.=
L=C3=B6wis" <martin@v.loewis.de> wrote:
Hmm. This creates a busy wait loop; if you add larger sleep values,
then it loses accuracy.I thought that at first, t=
oo, but then I checked the documentation for Sleep(0):"A value=
of zero causes the thread to relinquish the remainder of its
time slice to any other thread of equal priority that is ready to run."=
;(this is not to say that I think the solution with Sleep is worthw=
hile, though...)
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprise=
s, LLC
--0016e648f2c4a39309047766df68--
Martin v. L=F6wis skrev:
>> I did, and it does nothing of what I suggested. I am sure I can make the
>> Windows GIL in ceval_gil.h and the mutex in thread_nt.h at lot more prec=
ise and
>> efficient.
>> =
>
> Hmm. I'm skeptical that your code makes it more accurate, and I
> completely fail to see that it makes it more efficient (by what
> measurement of efficiency?)
>
> Also, why would making it more accurate make it better? IIUC,
> accuracy is completely irrelevant here, though efficiency
> (low overhead) does matter.
>
> =
>> This is the kind of code I was talking about, from ceval_gil.h:
>>
>> r =3D WaitForMultipleObjects(2, objects, TRUE, milliseconds);
>>
>> I would turn on multimedia timer (it is not on by default), and replace
>> this
>> call with a loop, approximately like this:
>>
>> for (;;) {
>> r =3D WaitForMultipleObjects(2, objects, TRUE, 0);
>> /* blah blah blah */ QueryPerformanceCounter(&cnt); if (cnt >
>> timeout) break;
>> Sleep(0);
>> }
>>
>> And the timeout "milliseconds" would now be computed from querying the
>> performance counter, instead of unreliably by the Windows NT kernel.
>> =
>
> Hmm. This creates a busy wait loop; if you add larger sleep values,
> then it loses accuracy.
>
> =
Actually an usleep lookes like this, and the call to the wait function =
must go into the for loop. But no, it's not a busy sleep.
static int inited =3D 0;
static __int64 hz;
static double dhz;
const double sleep_granularity =3D 2.0E10-3;
void usleep( long us )
{
__int64 cnt, end;
double diff;
if (!inited) {
timeBeginPeriod(1);
QueryPerformanceFrequency((LARGE_INTEGER*)&hz);
dhz =3D (double)hz;
inited =3D 1;
} =
QueryPerformanceCounter((LARGE_INTEGER*)&cnt);
end =3D cnt + (__int64)(1.0E10-6 * (double)(us) * dhz);
for (;;) {
QueryPerformanceCounter((LARGE_INTEGER*)&cnt);
if (cnt >=3D end) break;
diff =3D (double)(end - cnt)/dhz;
if (diff > sleep_granularity)
Sleep((DWORD)(diff - sleep_granularity));
else
Sleep(0); =
}
}
> Why not just call timeBeginPeriod, and then rely on the higher clock
> rate for WaitForMultipleObjects?
>
> =
That is what I suggested when Antoine said 1-2 ms was enough.
Sturla
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs=
.net
>> I did, and it does nothing of what I suggested. I am sure I can make the
>> Windows GIL in ceval_gil.h and the mutex in thread_nt.h at lot more prec=
ise and
>> efficient.
>> =
>
> Hmm. I'm skeptical that your code makes it more accurate, and I
> completely fail to see that it makes it more efficient (by what
> measurement of efficiency?)
>
> Also, why would making it more accurate make it better? IIUC,
> accuracy is completely irrelevant here, though efficiency
> (low overhead) does matter.
>
> =
>> This is the kind of code I was talking about, from ceval_gil.h:
>>
>> r =3D WaitForMultipleObjects(2, objects, TRUE, milliseconds);
>>
>> I would turn on multimedia timer (it is not on by default), and replace
>> this
>> call with a loop, approximately like this:
>>
>> for (;;) {
>> r =3D WaitForMultipleObjects(2, objects, TRUE, 0);
>> /* blah blah blah */ QueryPerformanceCounter(&cnt); if (cnt >
>> timeout) break;
>> Sleep(0);
>> }
>>
>> And the timeout "milliseconds" would now be computed from querying the
>> performance counter, instead of unreliably by the Windows NT kernel.
>> =
>
> Hmm. This creates a busy wait loop; if you add larger sleep values,
> then it loses accuracy.
>
> =
Actually an usleep lookes like this, and the call to the wait function =
must go into the for loop. But no, it's not a busy sleep.
static int inited =3D 0;
static __int64 hz;
static double dhz;
const double sleep_granularity =3D 2.0E10-3;
void usleep( long us )
{
__int64 cnt, end;
double diff;
if (!inited) {
timeBeginPeriod(1);
QueryPerformanceFrequency((LARGE_INTEGER*)&hz);
dhz =3D (double)hz;
inited =3D 1;
} =
QueryPerformanceCounter((LARGE_INTEGER*)&cnt);
end =3D cnt + (__int64)(1.0E10-6 * (double)(us) * dhz);
for (;;) {
QueryPerformanceCounter((LARGE_INTEGER*)&cnt);
if (cnt >=3D end) break;
diff =3D (double)(end - cnt)/dhz;
if (diff > sleep_granularity)
Sleep((DWORD)(diff - sleep_granularity));
else
Sleep(0); =
}
}
> Why not just call timeBeginPeriod, and then rely on the higher clock
> rate for WaitForMultipleObjects?
>
> =
That is what I suggested when Antoine said 1-2 ms was enough.
Sturla
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs=
.net
--0016e68dec2e0eaaa50477c5c1eb
Content-Type: text/plain; charset=ISO-8859-1
Hi Antoine,
I was finally able to compile py3k and run the benchmark (my compilation
issue was caused by checking out on Windows and compiling on Unix. Some
Makefile templates are missing correct EOL properties in SVN I think).
The benchmark results can be obtained from:
http://gaiacrtn.free.fr/py/benchmark-newgil/benchmark-newgil.tar.bz2
and viewed from:
http://gaiacrtn.free.fr/py/benchmark-newgil/
I ran the benchmark on two platforms:
- Solaris X86, 16 cores: some python extension are likely missing (see
config.log)
- Windows XP SP3, 4 cores: all python extensions but TCL (I didn't bother
checking why it failed as it is not used in the benchmark). It is a release
build.
The results look promising but I let you share your conclusion (some latency
results seem a bit strange from my understanding).
Side-note: PCBuild requires nasmw.exe but it no longer exists in the latest
version. I had to rename nasm.exe to nasmw.exe. Would be nice to add this to
the readme to avoid confusion...
Baptiste.
2009/11/1 Antoine Pitrou
>
> Hello again,
>
> Brett Cannon writes:
> >
> > I think it's worth it. Removal of the GIL is a totally open-ended problem
> > with no solution in sight. This, on the other hand, is a performance
> benefit
> > now. I say move forward with this. If it happens to be short-lived
> because
> > some actually figures out how to remove the GIL then great, but is that
> > really going to happen between now and Python 3.2? I doubt it.
>
> Based on this whole discussion, I think I am going to merge the new GIL
> work
> into the py3k branch, with priority requests disabled.
>
--0016e68dec2e0eaaa50477c5c1eb
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Hi Antoine,I was finally able to compile py3k and run the benchmark (my=
compilation issue was caused by checking out on Windows and compiling on U=
nix. Some Makefile templates are missing correct EOL properties in SVN I th=
ink).
The benchmark results can be obtained from:http://gaiacrtn.fr=
ee.fr/py/benchmark-newgil/benchmark-newgil.tar.bz2and viewed from:
=
http://gaiacrtn.free.fr/py/benchmark-newgil/
I ran the benchmark on two platforms:Solaris X86, 16 cores:=
some python extension are likely missing (see config.log)Windows =
XP SP3, 4 cores: all python extensions but TCL (I didn't bother checkin=
g why it failed as it is not used in the benchmark). It is a release build.=
The results look promising but I let you share your conclusion (s=
ome latency results seem a bit strange from my understanding).Side-=
note: PCBuild requires nasmw.exe but it no longer exists in the latest vers=
ion. I had to rename nasm.exe to nasmw.exe. Would be nice to add this to th=
e readme to avoid confusion...
Baptiste.2009/11/1 Antoine Pitrou <solipsis@pitrou.net>
Hello again,
Brett Cannon <brett <at> python.org> writes:
>
> I think it's worth it. Removal of the GIL is a totally open-ended =
problem
> with no solution in sight. This, on the other hand, is a performance b=
enefit
> now. I say move forward with this. If it happens to be short-lived bec=
ause
> some actually figures out how to remove the GIL then great, but is tha=
t
> really going to happen between now and Python 3.2? I doubt it.
Based on this whole discussion, I think I am going to merge the new GIL wor=
k
into the py3k branch, with priority requests disabled.
--0016e68dec2e0eaaa50477c5c1eb--
Content-Type: text/plain; charset=ISO-8859-1
Hi Antoine,
I was finally able to compile py3k and run the benchmark (my compilation
issue was caused by checking out on Windows and compiling on Unix. Some
Makefile templates are missing correct EOL properties in SVN I think).
The benchmark results can be obtained from:
http://gaiacrtn.free.fr/py/benchmark-newgil/benchmark-newgil.tar.bz2
and viewed from:
http://gaiacrtn.free.fr/py/benchmark-newgil/
I ran the benchmark on two platforms:
- Solaris X86, 16 cores: some python extension are likely missing (see
config.log)
- Windows XP SP3, 4 cores: all python extensions but TCL (I didn't bother
checking why it failed as it is not used in the benchmark). It is a release
build.
The results look promising but I let you share your conclusion (some latency
results seem a bit strange from my understanding).
Side-note: PCBuild requires nasmw.exe but it no longer exists in the latest
version. I had to rename nasm.exe to nasmw.exe. Would be nice to add this to
the readme to avoid confusion...
Baptiste.
2009/11/1 Antoine Pitrou
>
> Hello again,
>
> Brett Cannon writes:
> >
> > I think it's worth it. Removal of the GIL is a totally open-ended problem
> > with no solution in sight. This, on the other hand, is a performance
> benefit
> > now. I say move forward with this. If it happens to be short-lived
> because
> > some actually figures out how to remove the GIL then great, but is that
> > really going to happen between now and Python 3.2? I doubt it.
>
> Based on this whole discussion, I think I am going to merge the new GIL
> work
> into the py3k branch, with priority requests disabled.
>
--0016e68dec2e0eaaa50477c5c1eb
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Hi Antoine,I was finally able to compile py3k and run the benchmark (my=
compilation issue was caused by checking out on Windows and compiling on U=
nix. Some Makefile templates are missing correct EOL properties in SVN I th=
ink).
The benchmark results can be obtained from:http://gaiacrtn.fr=
ee.fr/py/benchmark-newgil/benchmark-newgil.tar.bz2and viewed from:
=
http://gaiacrtn.free.fr/py/benchmark-newgil/
I ran the benchmark on two platforms:Solaris X86, 16 cores:=
some python extension are likely missing (see config.log)Windows =
XP SP3, 4 cores: all python extensions but TCL (I didn't bother checkin=
g why it failed as it is not used in the benchmark). It is a release build.=
The results look promising but I let you share your conclusion (s=
ome latency results seem a bit strange from my understanding).Side-=
note: PCBuild requires nasmw.exe but it no longer exists in the latest vers=
ion. I had to rename nasm.exe to nasmw.exe. Would be nice to add this to th=
e readme to avoid confusion...
Baptiste.2009/11/1 Antoine Pitrou <solipsis@pitrou.net>
Hello again,
Brett Cannon <brett <at> python.org> writes:
>
> I think it's worth it. Removal of the GIL is a totally open-ended =
problem
> with no solution in sight. This, on the other hand, is a performance b=
enefit
> now. I say move forward with this. If it happens to be short-lived bec=
ause
> some actually figures out how to remove the GIL then great, but is tha=
t
> really going to happen between now and Python 3.2? I doubt it.
Based on this whole discussion, I think I am going to merge the new GIL wor=
k
into the py3k branch, with priority requests disabled.
--0016e68dec2e0eaaa50477c5c1eb--
Hello,
> Solaris X86, 16 cores: some python extension are likely missing (see
config.log)
> Windows XP SP3, 4 cores: all python extensions but TCL (I didn't bother
checking why it failed as it is not used in the benchmark). It is a release
build.
>
> The results look promising but I let you share your conclusion (some latency
results seem a bit strange from my understanding).
Thank you! The latency results don't look that strange to me.
If you're surprised that py3k shows better latency than newgil on the "pi
calculation" workload, it's easy to understand why: py3k speculatively releases
the GIL so often on that workload (every 100 opcodes) that the latencies are
indeed very good, but if you look at the corresponding throughput numbers they
are dismal (your Solaris box shows it falling to less than 20% with two threads
running compared to the baseline number for single-thread execution, and on your
Windows box the number is hardly better with 45%).
So, to sum it up, the way the current GIL manages to have good latencies is by
issueing an unreasonable number of system calls on a contended lock, and
potentially killing throughput performance (this depends on the OS too, because
numbers under Linux are not so catastrophic).
The new GIL, on the other hand, is much more balanced in that it achieves rather
predictable latencies (especially when you don't overcommit the OS by issueing
more computational threads than you have CPU cores) while preserving throughput
performance.
> Side-note: PCBuild requires nasmw.exe but it no longer exists in the latest
> version. I had to rename nasm.exe to nasmw.exe. Would be nice to add this to
> the readme to avoid confusion...
You should file a bug on http://bugs.python.org
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Antoine,
How close are you to merging this into the Py3k branch? It looks like
a solid piece of work, that can only get better in the period between
now and the release of 3.2. But I don't want to rush you, and I only
have had a brief look at your code. (I whipped up a small Dave Beazley
example and was impressed by the performance of your code compared to
the original py3k branch -- from 150% to 100% CPU usage according to
top with only 2 threads.)
My only suggestion so far: the description could use more explicit
documentation on the various variables and macros and how they
combine.
I also expect that priority requests aren't important; it somehow
seems strange that if multiple threads are all doing I/O each new
thread whose I/O completes would get to preempt whoever else is active
immediately. (Also the choice of *not* making a priority request when
a read returns no bytes seems strange 00 if I read the code
correctly.)
Anyway, thanks for this work!
--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
How close are you to merging this into the Py3k branch? It looks like
a solid piece of work, that can only get better in the period between
now and the release of 3.2. But I don't want to rush you, and I only
have had a brief look at your code. (I whipped up a small Dave Beazley
example and was impressed by the performance of your code compared to
the original py3k branch -- from 150% to 100% CPU usage according to
top with only 2 threads.)
My only suggestion so far: the description could use more explicit
documentation on the various variables and macros and how they
combine.
I also expect that priority requests aren't important; it somehow
seems strange that if multiple threads are all doing I/O each new
thread whose I/O completes would get to preempt whoever else is active
immediately. (Also the choice of *not* making a priority request when
a read returns no bytes seems strange 00 if I read the code
correctly.)
Anyway, thanks for this work!
--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Hello Guido,
> How close are you to merging this into the Py3k branch? It looks like
> a solid piece of work, that can only get better in the period between
> now and the release of 3.2. But I don't want to rush you, and I only
> have had a brief look at your code.
The code is ready. Priority requests are already disabled, I'm just
wondering whether to remove them from the code, or leave them there in
case someone thinks they're useful. I suppose removing them is ok.
> My only suggestion so far: the description could use more explicit
> documentation on the various variables and macros and how they
> combine.
Is it before or after
http://mail.python.org/pipermail/python-checkins/2009-November/087482.html ?
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
On Sat, Nov 7, 2009 at 9:01 AM, Antoine Pitrou wrote:
>
> Hello Guido,
>
>> How close are you to merging this into the Py3k branch? It looks like
>> a solid piece of work, that can only get better in the period between
>> now and the release of 3.2. But I don't want to rush you, and I only
>> have had a brief look at your code.
>
> The code is ready. Priority requests are already disabled, I'm just
> wondering whether to remove them from the code, or leave them there in
> case someone thinks they're useful. I suppose removing them is ok.
I would remove them -- if and when we find there's a need for
something like them I suspect it won't look like what you currently
have, so it's better not to complexificate your current patch with
them. (I would remove all traces, including the conditions passed into
the end macros.)
>> My only suggestion so far: the description could use more explicit
>> documentation on the various variables and macros and how they
>> combine.
>
> Is it before or after
> http://mail.python.org/pipermail/python-checkins/2009-November/087482.html ?
After. While that is already really helpful, not all the code is
easily linked back to paragraphs in that comment block, and some
variables are not mentioned by name in the block.
--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>
> Hello Guido,
>
>> How close are you to merging this into the Py3k branch? It looks like
>> a solid piece of work, that can only get better in the period between
>> now and the release of 3.2. But I don't want to rush you, and I only
>> have had a brief look at your code.
>
> The code is ready. Priority requests are already disabled, I'm just
> wondering whether to remove them from the code, or leave them there in
> case someone thinks they're useful. I suppose removing them is ok.
I would remove them -- if and when we find there's a need for
something like them I suspect it won't look like what you currently
have, so it's better not to complexificate your current patch with
them. (I would remove all traces, including the conditions passed into
the end macros.)
>> My only suggestion so far: the description could use more explicit
>> documentation on the various variables and macros and how they
>> combine.
>
> Is it before or after
> http://mail.python.org/pipermail/python-checkins/2009-November/087482.html ?
After. While that is already really helpful, not all the code is
easily linked back to paragraphs in that comment block, and some
variables are not mentioned by name in the block.
--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
--0016e6d2621783b8570477cd18dc
Content-Type: text/plain; charset=ISO-8859-1
2009/11/7 Antoine Pitrou
>
> [...]
> So, to sum it up, the way the current GIL manages to have good latencies is
> by
> issueing an unreasonable number of system calls on a contended lock, and
> potentially killing throughput performance (this depends on the OS too,
> because
> numbers under Linux are not so catastrophic).
>
Ah, I remember reading this in the analysis that was published now!
I made another benchmark using one of my script I ported to python 3 (and
simplified a bit). I only test the total execution time. Tests done on
Windows XP SP3. Processor is an Intel Core 2 Quad Q9300 (4 cores).
You can get the script from:
http://gaiacrtn.free.fr/py/benchmark-kwd-newgil/purekeyword-py26-3k.py
Script + test doc (940KB):
http://gaiacrtn.free.fr/py/benchmark-kwd-newgil/benchmark-kwd-newgil.tar.bz2
The threaded loop is:
for match in self.punctuation_pattern.finditer( document ):
word = document[last_start_index:match.start()]
if len(word) > 1 and len(word) < MAX_KEYWORD_LENGTH:
words[word] = words.get(word, 0) + 1
last_start_index = match.end()
if word:
word_count += 1
Here are the results:
-j0 (main thread only)
2.5.2 : 17.991s, 17.947s, 17.780s
2.6.2 : 19.071s, 19.023s, 19.054s
3.1.1 : 46.384s, 46.321s, 46.425s
newgil: 47.483s, 47.605s, 47.512s
-j4 (4 consumer threads, main thread producing/waiting)
2.5.2 : 31.105s, 30.568s
2.6.2 : 31.550s, 30.599s
3.1.1 : 85.114s, 85.185s
newgil: 48.428s, 49.217s
It shows that, on my platform for this specific benchmark:
- newgil manage to leverage a significant amount of parallelism (1.7)
where python 3.1 does not (3.1 is 80% slower)
- newgil as a small impact on non multi-threaded execution (~1-2%) [may
be worth investigating]
- 3.1 is more than 2 times slower than python 2.6 on this benchmark
- 2.6 is "only" 65% slower when run with multiple threads compared to the
80% slower of 3.1.
Newgil is a vast improvement as it manages to leverage the short time the
GIL is released by finditer [if I understood correctly in 3.x regex release
the GIL].
What's worry me is the single threaded performance degradation between 2.6
and 3.1 on this test. Could the added GIL release/acquire on each finditer
call explain this?
--0016e6d2621783b8570477cd18dc
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
2009/11/7 Antoine Pitrou <solipsis@pitrou.net=
>
[...]
So, to sum it up, the way the current GIL manages to have good latencies is=
by
issueing an unreasonable number of system calls on a contended lock, and
potentially killing throughput performance (this depends on the OS too, bec=
ause
numbers under Linux are not so catastrophic).Ah, I re=
member reading this in the analysis that was published now!I made a=
nother benchmark using one of my script I ported to python 3 (and simplifie=
d a bit). I only test the total execution time. Tests done on Windows XP SP=
3. Processor is an Intel Core 2 Quad Q9300 (4 cores).
You can get the script from:http://gaiacrtn.free.fr/py/benc=
hmark-kwd-newgil/purekeyword-py26-3k.pyScript + test doc (940KB):
http://gaiacrtn.free.fr/py/benchmark-kwd-newgil/benchmark-kwd=
-newgil.tar.bz2The threaded loop is:for match in self.punct=
uation_pattern.finditer( document ):
=A0=A0=A0 word =3D document[last_start_index:match.start()]=A0=A0=A0 if=
len(word) > 1 and len(word) < MAX_KEYWORD_LENGTH:=A0=A0=A0=A0=A0=
=A0=A0 words[word] =3D words.get(word, 0) + 1=A0=A0=A0 last_start_index=
=3D match.end()=A0=A0=A0 if word:
=A0=A0=A0=A0=A0=A0=A0 word_count +=3D 1Here are the results:-j0 (main thread only)2.5.2 : 17.991s, 17.947s, 17.780s2.6.2 : 19.=
071s, 19.023s, 19.054s3.1.1 : 46.384s, 46.321s, 46.425snewgil: 47.4=
83s, 47.605s, 47.512s
-j4 (4 consumer threads, main thread producing/waiting)2.5.2 : 31.1=
05s, 30.568s2.6.2 : 31.550s, 30.599s3.1.1 : 85.114s, 85.185snew=
gil: 48.428s, 49.217sIt shows that, on my platform for =
this specific benchmark:
=A0newgil manage to leverage a significant amount of parallelism (1=
.7) where python 3.1 does not (3.1 is 80% slower)newgil as a small=
impact on non multi-threaded execution (~1-2%) [may be worth investigating=
]
3.1 is more than 2 times slower than python 2.6 on this benchmark2.6 is "only" 65% slower when run with multiple threads c=
ompared to the 80% slower of 3.1.Newgil is a vast improvement as =
it manages to leverage the short time the GIL is released by finditer [if I=
understood correctly in 3.x regex release the GIL].
What's worry me is the single threaded performance degradation betw=
een 2.6 and 3.1 on this test. Could the added GIL release/acquire on each f=
inditer call explain this?
--0016e6d2621783b8570477cd18dc--
Content-Type: text/plain; charset=ISO-8859-1
2009/11/7 Antoine Pitrou
>
> [...]
> So, to sum it up, the way the current GIL manages to have good latencies is
> by
> issueing an unreasonable number of system calls on a contended lock, and
> potentially killing throughput performance (this depends on the OS too,
> because
> numbers under Linux are not so catastrophic).
>
Ah, I remember reading this in the analysis that was published now!
I made another benchmark using one of my script I ported to python 3 (and
simplified a bit). I only test the total execution time. Tests done on
Windows XP SP3. Processor is an Intel Core 2 Quad Q9300 (4 cores).
You can get the script from:
http://gaiacrtn.free.fr/py/benchmark-kwd-newgil/purekeyword-py26-3k.py
Script + test doc (940KB):
http://gaiacrtn.free.fr/py/benchmark-kwd-newgil/benchmark-kwd-newgil.tar.bz2
The threaded loop is:
for match in self.punctuation_pattern.finditer( document ):
word = document[last_start_index:match.start()]
if len(word) > 1 and len(word) < MAX_KEYWORD_LENGTH:
words[word] = words.get(word, 0) + 1
last_start_index = match.end()
if word:
word_count += 1
Here are the results:
-j0 (main thread only)
2.5.2 : 17.991s, 17.947s, 17.780s
2.6.2 : 19.071s, 19.023s, 19.054s
3.1.1 : 46.384s, 46.321s, 46.425s
newgil: 47.483s, 47.605s, 47.512s
-j4 (4 consumer threads, main thread producing/waiting)
2.5.2 : 31.105s, 30.568s
2.6.2 : 31.550s, 30.599s
3.1.1 : 85.114s, 85.185s
newgil: 48.428s, 49.217s
It shows that, on my platform for this specific benchmark:
- newgil manage to leverage a significant amount of parallelism (1.7)
where python 3.1 does not (3.1 is 80% slower)
- newgil as a small impact on non multi-threaded execution (~1-2%) [may
be worth investigating]
- 3.1 is more than 2 times slower than python 2.6 on this benchmark
- 2.6 is "only" 65% slower when run with multiple threads compared to the
80% slower of 3.1.
Newgil is a vast improvement as it manages to leverage the short time the
GIL is released by finditer [if I understood correctly in 3.x regex release
the GIL].
What's worry me is the single threaded performance degradation between 2.6
and 3.1 on this test. Could the added GIL release/acquire on each finditer
call explain this?
--0016e6d2621783b8570477cd18dc
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
2009/11/7 Antoine Pitrou <solipsis@pitrou.net=
>
[...]
So, to sum it up, the way the current GIL manages to have good latencies is=
by
issueing an unreasonable number of system calls on a contended lock, and
potentially killing throughput performance (this depends on the OS too, bec=
ause
numbers under Linux are not so catastrophic).Ah, I re=
member reading this in the analysis that was published now!I made a=
nother benchmark using one of my script I ported to python 3 (and simplifie=
d a bit). I only test the total execution time. Tests done on Windows XP SP=
3. Processor is an Intel Core 2 Quad Q9300 (4 cores).
You can get the script from:http://gaiacrtn.free.fr/py/benc=
hmark-kwd-newgil/purekeyword-py26-3k.pyScript + test doc (940KB):
http://gaiacrtn.free.fr/py/benchmark-kwd-newgil/benchmark-kwd=
-newgil.tar.bz2The threaded loop is:for match in self.punct=
uation_pattern.finditer( document ):
=A0=A0=A0 word =3D document[last_start_index:match.start()]=A0=A0=A0 if=
len(word) > 1 and len(word) < MAX_KEYWORD_LENGTH:=A0=A0=A0=A0=A0=
=A0=A0 words[word] =3D words.get(word, 0) + 1=A0=A0=A0 last_start_index=
=3D match.end()=A0=A0=A0 if word:
=A0=A0=A0=A0=A0=A0=A0 word_count +=3D 1Here are the results:-j0 (main thread only)2.5.2 : 17.991s, 17.947s, 17.780s2.6.2 : 19.=
071s, 19.023s, 19.054s3.1.1 : 46.384s, 46.321s, 46.425snewgil: 47.4=
83s, 47.605s, 47.512s
-j4 (4 consumer threads, main thread producing/waiting)2.5.2 : 31.1=
05s, 30.568s2.6.2 : 31.550s, 30.599s3.1.1 : 85.114s, 85.185snew=
gil: 48.428s, 49.217sIt shows that, on my platform for =
this specific benchmark:
=A0newgil manage to leverage a significant amount of parallelism (1=
.7) where python 3.1 does not (3.1 is 80% slower)newgil as a small=
impact on non multi-threaded execution (~1-2%) [may be worth investigating=
]
3.1 is more than 2 times slower than python 2.6 on this benchmark2.6 is "only" 65% slower when run with multiple threads c=
ompared to the 80% slower of 3.1.Newgil is a vast improvement as =
it manages to leverage the short time the GIL is released by finditer [if I=
understood correctly in 3.x regex release the GIL].
What's worry me is the single threaded performance degradation betw=
een 2.6 and 3.1 on this test. Could the added GIL release/acquire on each f=
inditer call explain this?
--0016e6d2621783b8570477cd18dc--
Hello again,
> It shows that, on my platform for this specific benchmark:
> * newgil manage to leverage a significant amount of parallelism
> (1.7) where python 3.1 does not (3.1 is 80% slower)
I think you are mistaken:
-j0 (main thread only)
newgil: 47.483s, 47.605s, 47.512s
-j4 (4 consumer threads, main thread producing/waiting)
newgil: 48.428s, 49.217s
The runtimes are actually the same, so newgil doesn't leverage anything.
However, it doesn't degrade performance like 2.x/3.1 does :-)
> * newgil as a small impact on non multi-threaded execution
> (~1-2%) [may be worth investigating]
It goes from very slightly slower to very slightly faster and it is
likely to be caused by variations in generated output from the compiler.
> * 3.1 is more than 2 times slower than python 2.6 on this
> benchmark
That's the most worrying outcome I'd say. Are you sure the benchmark
really does the same thing? Under 2.6, you should add re.UNICODE to the
regular expression flags so as to match the 3.x semantics.
> [if I understood correctly in 3.x regex release the GIL].
Unless I've missed something it doesn't, no.
This could be a separate opportunity for optimization, if someone wants
to take a look at it.
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
--0016e6435394b61b470477dc9949
Content-Type: text/plain; charset=ISO-8859-1
2009/11/7 Antoine Pitrou
>
> Hello again,
>
> > It shows that, on my platform for this specific benchmark:
> > * newgil manage to leverage a significant amount of parallelism
> > (1.7) where python 3.1 does not (3.1 is 80% slower)
>
> I think you are mistaken:
>
> -j0 (main thread only)
> newgil: 47.483s, 47.605s, 47.512s
> -j4 (4 consumer threads, main thread producing/waiting)
> newgil: 48.428s, 49.217s
>
> The runtimes are actually the same, so newgil doesn't leverage anything.
> However, it doesn't degrade performance like 2.x/3.1 does :-)
>
Ooops, I was comparing to 3.1 -j4 times which make no sense. One would think
I wanted to see that result since I though the GIL was released :/. This
greatly reduce the interest of this benchmark...
> > * 3.1 is more than 2 times slower than python 2.6 on this
> > benchmark
>
> That's the most worrying outcome I'd say. Are you sure the benchmark
> really does the same thing? Under 2.6, you should add re.UNICODE to the
> regular expression flags so as to match the 3.x semantics.
>
I've tried, but there is no change in result (the regexp does not use \w &
co but specify a lot unicode ranges). All strings are already of unicode
type in 2.6.
> > [if I understood correctly in 3.x regex release the GIL].
>
> Unless I've missed something it doesn't, no.
>
Hmmm, I was confusing with other modules (bzip2 & hashlib?). Looking back at
the result of your benchmark it's obvious. Is there a place where the list
of functions releasing the GIL is available? I did not see anything in
bz2.compress documentation.
--0016e6435394b61b470477dc9949
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
2009/11/7 Antoine Pitrou <solipsis@pitrou.net=
>
Hello again,
> It shows that, on my platform for this specific benchmark:
> =A0 =A0 =A0 * =A0newgil manage to leverage a significant amount of par=
allelism
> =A0 =A0 =A0 =A0 (1.7) where python 3.1 does not (3.1 is 80% slower)
I think you are mistaken:
-j0 (main thread only)
newgil: 47.483s, 47.605s, 47.512s
-j4 (4 consumer threads, main thread producing/waiting)
newgil: 48.428s, 49.217s
The runtimes are actually the same, so newgil doesn't leverage an=
ything.
However, it doesn't degrade performance like 2.x/3.1 does :-)Ooops, I was comparing to 3.1 -j4 times which make no sense=
. One would think I wanted to see that result since I though the GIL was re=
leased :/. This greatly reduce the interest of this benchmark...
=A0
> =A0 =A0 =A0 * 3.1 is more than 2 times slower than python 2.6 on this
> =A0 =A0 =A0 =A0 benchmark
That's the most worrying outcome I'd say. Are you sure the benchmar=
k
really does the same thing? Under 2.6, you should add re.UNICODE to the
regular expression flags so as to match the 3.x semantics.=
I've tried, but there is no change in result (the regexp does =
not use \w & co but specify a lot unicode ranges). All strings are alre=
ady of unicode type in 2.6.
=A0
> [if I understood correctly in 3.x regex release the GIL].
Unless I've missed something it doesn't, no.=
Hmmm, I was confusing with other modules (bzip2 & hashlib?). Looki=
ng back at the result of your benchmark it's obvious. Is there a place =
where the list of functions releasing the GIL is available? I did not see a=
nything in bz2.compress documentation.
--0016e6435394b61b470477dc9949--
Content-Type: text/plain; charset=ISO-8859-1
2009/11/7 Antoine Pitrou
>
> Hello again,
>
> > It shows that, on my platform for this specific benchmark:
> > * newgil manage to leverage a significant amount of parallelism
> > (1.7) where python 3.1 does not (3.1 is 80% slower)
>
> I think you are mistaken:
>
> -j0 (main thread only)
> newgil: 47.483s, 47.605s, 47.512s
> -j4 (4 consumer threads, main thread producing/waiting)
> newgil: 48.428s, 49.217s
>
> The runtimes are actually the same, so newgil doesn't leverage anything.
> However, it doesn't degrade performance like 2.x/3.1 does :-)
>
Ooops, I was comparing to 3.1 -j4 times which make no sense. One would think
I wanted to see that result since I though the GIL was released :/. This
greatly reduce the interest of this benchmark...
> > * 3.1 is more than 2 times slower than python 2.6 on this
> > benchmark
>
> That's the most worrying outcome I'd say. Are you sure the benchmark
> really does the same thing? Under 2.6, you should add re.UNICODE to the
> regular expression flags so as to match the 3.x semantics.
>
I've tried, but there is no change in result (the regexp does not use \w &
co but specify a lot unicode ranges). All strings are already of unicode
type in 2.6.
> > [if I understood correctly in 3.x regex release the GIL].
>
> Unless I've missed something it doesn't, no.
>
Hmmm, I was confusing with other modules (bzip2 & hashlib?). Looking back at
the result of your benchmark it's obvious. Is there a place where the list
of functions releasing the GIL is available? I did not see anything in
bz2.compress documentation.
--0016e6435394b61b470477dc9949
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
2009/11/7 Antoine Pitrou <solipsis@pitrou.net=
>
Hello again,
> It shows that, on my platform for this specific benchmark:
> =A0 =A0 =A0 * =A0newgil manage to leverage a significant amount of par=
allelism
> =A0 =A0 =A0 =A0 (1.7) where python 3.1 does not (3.1 is 80% slower)
I think you are mistaken:
-j0 (main thread only)
newgil: 47.483s, 47.605s, 47.512s
-j4 (4 consumer threads, main thread producing/waiting)
newgil: 48.428s, 49.217s
The runtimes are actually the same, so newgil doesn't leverage an=
ything.
However, it doesn't degrade performance like 2.x/3.1 does :-)Ooops, I was comparing to 3.1 -j4 times which make no sense=
. One would think I wanted to see that result since I though the GIL was re=
leased :/. This greatly reduce the interest of this benchmark...
=A0
> =A0 =A0 =A0 * 3.1 is more than 2 times slower than python 2.6 on this
> =A0 =A0 =A0 =A0 benchmark
That's the most worrying outcome I'd say. Are you sure the benchmar=
k
really does the same thing? Under 2.6, you should add re.UNICODE to the
regular expression flags so as to match the 3.x semantics.=
I've tried, but there is no change in result (the regexp does =
not use \w & co but specify a lot unicode ranges). All strings are alre=
ady of unicode type in 2.6.
=A0
> [if I understood correctly in 3.x regex release the GIL].
Unless I've missed something it doesn't, no.=
Hmmm, I was confusing with other modules (bzip2 & hashlib?). Looki=
ng back at the result of your benchmark it's obvious. Is there a place =
where the list of functions releasing the GIL is available? I did not see a=
nything in bz2.compress documentation.
--0016e6435394b61b470477dc9949--
Baptiste Lepilleur writes:
>
> I've tried, but there is no change in result (the regexp does not use \w &
> co but specify a lot unicode ranges). All strings are already of unicode
> type in 2.6.
No they aren't. You should add "from __future__ import unicode_literals" at the
start of your script and run it again.
> Hmmm, I was confusing with other modules (bzip2 & hashlib?). Looking back at
> the result of your benchmark it's obvious. Is there a place where the list of
> functions releasing the GIL is available? I did not see anything in
> bz2.compress documentation.
No, there isn't. You'd have to test, or read the source code.
But bz2 and zlib, for example, do release the GIL.
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
>
> I've tried, but there is no change in result (the regexp does not use \w &
> co but specify a lot unicode ranges). All strings are already of unicode
> type in 2.6.
No they aren't. You should add "from __future__ import unicode_literals" at the
start of your script and run it again.
> Hmmm, I was confusing with other modules (bzip2 & hashlib?). Looking back at
> the result of your benchmark it's obvious. Is there a place where the list of
> functions releasing the GIL is available? I did not see anything in
> bz2.compress documentation.
No, there isn't. You'd have to test, or read the source code.
But bz2 and zlib, for example, do release the GIL.
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
Hello again,
I've now removed priority requests, tried to improve the internal doc a
bit, and merged the changes into py3k.
Afterwards, the new Windows 7 buildbot has hung in test_multiprocessing,
but I don't know whether it's related.
Regards
Antoine.
Guido van Rossum writes:
>
>
> I would remove them -- if and when we find there's a need for
> something like them I suspect it won't look like what you currently
> have, so it's better not to complexificate your current patch with
> them. (I would remove all traces, including the conditions passed into
> the end macros.)
>
> >> My only suggestion so far: the description could use more explicit
> >> documentation on the various variables and macros and how they
> >> combine.
> >
> > Is it before or after
> > http://mail.python.org/pipermail/python-checkins/2009-
November/087482.html ?
>
> After. While that is already really helpful, not all the code is
> easily linked back to paragraphs in that comment block, and some
> variables are not mentioned by name in the block.
>
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/bull%40pubbs.net
\mi
Related Threads
- Google Summer of Code - ofbiz
- RE: Find Oracle Client Version - from the Database side - oracle
- [PATCH V2 2/4] nodemask: fix the declaration of NODEMASK_ALLOC() - kernel
- mysql proxy in production? - mysql
- [PHP] PHP MySQL Insert Statements - php
- Bug report for Tomcat 5 [2010/03/14] - tomcat
- VOIP Today Magazine 6th issue - openbsd
- About Security:cipher and Suhosin patch - cakephp
- [css3-multicol] Typos and related issues - web
- [Hendrix] 3.6.2 very stable - firefox
- problem sorting - ruby
- OpenBSD 4.6 - openbsd