- Previous thread: [PATCH] net: Keep interface binding when sending packets with ipi_ifindex = 0
- Next thread: [PATCH][RFT] cfq-iosched: drain device before switching to a sync queue
- Threads sorted by date: kernel 200907
o Currently one can dispatch requests from multiple queues to the disk. This
is true for hardware which supports queuing. So if a disk support queue
depth of 31 it is possible that 20 requests are dispatched from queue 1
and then next queue is scheduled in which dispatches more requests.
o This multiple queue dispatch introduces issues for accurate accounting of
disk time consumed by a particular queue. For example, if one async queue
is scheduled in, it can dispatch 31 requests to the disk and then it will
be expired and a new sync queue might get scheduled in. These 31 requests
might take a long time to finish but this time is never accounted to the
async queue which dispatched these requests.
o This patch introduces the functionality where we wait for all the requests
to finish from previous queue before next queue is scheduled in. That way
a queue is more accurately accounted for disk time it has consumed. Note
this still does not take care of errors introduced by disk write caching.
o Because above behavior can result in reduced throughput, this behavior will
be enabled only if user sets "fairness" tunable to 2 or higher.
o This patch helps in achieving more isolation between reads and buffered
writes in different cgroups. buffered writes typically utilize full queue
depth and then expire the queue. On the contarary, sequential reads
typicaly driver queue depth of 1. So despite the fact that writes are
using more disk time it is never accounted to write queue because we don't
wait for requests to finish after dispatching these. This patch helps
do more accurate accounting of disk time, especially for buffered writes
hence providing better fairness hence better isolation between two cgroups
running read and write workloads.
Signed-off-by: Vivek Goyal
---
block/elevator-fq.c | 31 ++++++++++++++++++++++++++++++-
1 files changed, 30 insertions(+), 1 deletions(-)
diff --git a/block/elevator-fq.c b/block/elevator-fq.c
index 68be1dc..7609579 100644
--- a/block/elevator-fq.c
+++ b/block/elevator-fq.c
@@ -2038,7 +2038,7 @@ STORE_FUNCTION(elv_slice_sync_store, &efqd->elv_slice[1], 1, UINT_MAX, 1);
EXPORT_SYMBOL(elv_slice_sync_store);
STORE_FUNCTION(elv_slice_async_store, &efqd->elv_slice[0], 1, UINT_MAX, 1);
EXPORT_SYMBOL(elv_slice_async_store);
-STORE_FUNCTION(elv_fairness_store, &efqd->fairness, 0, 1, 0);
+STORE_FUNCTION(elv_fairness_store, &efqd->fairness, 0, 2, 0);
EXPORT_SYMBOL(elv_fairness_store);
#undef STORE_FUNCTION
@@ -2952,6 +2952,24 @@ void *elv_fq_select_ioq(struct request_queue *q, int force)
}
expire:
+ if (efqd->fairness >= 2 && !force && ioq && ioq->dispatched) {
+ /*
+ * If there are request dispatched from this queue, don't
+ * dispatch requests from new queue till all the requests from
+ * this queue have completed.
+ *
+ * This helps in attributing right amount of disk time consumed
+ * by a particular queue when hardware allows queuing.
+ *
+ * Set ioq = NULL so that no more requests are dispatched from
+ * this queue.
+ */
+ elv_log_ioq(efqd, ioq, "select: wait for requests to finish"
+ " disp=%lu", ioq->dispatched);
+ ioq = NULL;
+ goto keep_queue;
+ }
+
elv_ioq_slice_expired(q);
new_queue:
ioq = elv_set_active_ioq(q, new_ioq);
@@ -3109,6 +3127,17 @@ void elv_ioq_completed_request(struct request_queue *q, struct request *rq)
*/
elv_ioq_arm_slice_timer(q, 1);
} else {
+ /* If fairness >=2 and there are requests
+ * dispatched from this queue, don't dispatch
+ * new requests from a different queue till
+ * all requests from this queue have finished.
+ * This helps in attributing right disk time
+ * to a queue when hardware supports queuing.
+ */
+
+ if (efqd->fairness >= 2 && ioq->dispatched)
+ goto done;
+
/* Expire the queue */
elv_ioq_slice_expired(q);
}
--
1.6.0.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
is true for hardware which supports queuing. So if a disk support queue
depth of 31 it is possible that 20 requests are dispatched from queue 1
and then next queue is scheduled in which dispatches more requests.
o This multiple queue dispatch introduces issues for accurate accounting of
disk time consumed by a particular queue. For example, if one async queue
is scheduled in, it can dispatch 31 requests to the disk and then it will
be expired and a new sync queue might get scheduled in. These 31 requests
might take a long time to finish but this time is never accounted to the
async queue which dispatched these requests.
o This patch introduces the functionality where we wait for all the requests
to finish from previous queue before next queue is scheduled in. That way
a queue is more accurately accounted for disk time it has consumed. Note
this still does not take care of errors introduced by disk write caching.
o Because above behavior can result in reduced throughput, this behavior will
be enabled only if user sets "fairness" tunable to 2 or higher.
o This patch helps in achieving more isolation between reads and buffered
writes in different cgroups. buffered writes typically utilize full queue
depth and then expire the queue. On the contarary, sequential reads
typicaly driver queue depth of 1. So despite the fact that writes are
using more disk time it is never accounted to write queue because we don't
wait for requests to finish after dispatching these. This patch helps
do more accurate accounting of disk time, especially for buffered writes
hence providing better fairness hence better isolation between two cgroups
running read and write workloads.
Signed-off-by: Vivek Goyal
---
block/elevator-fq.c | 31 ++++++++++++++++++++++++++++++-
1 files changed, 30 insertions(+), 1 deletions(-)
diff --git a/block/elevator-fq.c b/block/elevator-fq.c
index 68be1dc..7609579 100644
--- a/block/elevator-fq.c
+++ b/block/elevator-fq.c
@@ -2038,7 +2038,7 @@ STORE_FUNCTION(elv_slice_sync_store, &efqd->elv_slice[1], 1, UINT_MAX, 1);
EXPORT_SYMBOL(elv_slice_sync_store);
STORE_FUNCTION(elv_slice_async_store, &efqd->elv_slice[0], 1, UINT_MAX, 1);
EXPORT_SYMBOL(elv_slice_async_store);
-STORE_FUNCTION(elv_fairness_store, &efqd->fairness, 0, 1, 0);
+STORE_FUNCTION(elv_fairness_store, &efqd->fairness, 0, 2, 0);
EXPORT_SYMBOL(elv_fairness_store);
#undef STORE_FUNCTION
@@ -2952,6 +2952,24 @@ void *elv_fq_select_ioq(struct request_queue *q, int force)
}
expire:
+ if (efqd->fairness >= 2 && !force && ioq && ioq->dispatched) {
+ /*
+ * If there are request dispatched from this queue, don't
+ * dispatch requests from new queue till all the requests from
+ * this queue have completed.
+ *
+ * This helps in attributing right amount of disk time consumed
+ * by a particular queue when hardware allows queuing.
+ *
+ * Set ioq = NULL so that no more requests are dispatched from
+ * this queue.
+ */
+ elv_log_ioq(efqd, ioq, "select: wait for requests to finish"
+ " disp=%lu", ioq->dispatched);
+ ioq = NULL;
+ goto keep_queue;
+ }
+
elv_ioq_slice_expired(q);
new_queue:
ioq = elv_set_active_ioq(q, new_ioq);
@@ -3109,6 +3127,17 @@ void elv_ioq_completed_request(struct request_queue *q, struct request *rq)
*/
elv_ioq_arm_slice_timer(q, 1);
} else {
+ /* If fairness >=2 and there are requests
+ * dispatched from this queue, don't dispatch
+ * new requests from a different queue till
+ * all requests from this queue have finished.
+ * This helps in attributing right disk time
+ * to a queue when hardware supports queuing.
+ */
+
+ if (efqd->fairness >= 2 && ioq->dispatched)
+ goto done;
+
/* Expire the queue */
elv_ioq_slice_expired(q);
}
--
1.6.0.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Conversations: [RFC] IO scheduler based IO controller V6
- [RFC] IO scheduler based IO controller V6 by Vivek Goyal on 2009-07-02T20:04:34+00:00
- [PATCH 01/25] io-controller: Documentation by Vivek Goyal on 2009-07-02T20:05:04+00:00
- [PATCH 05/25] io-controller: Charge for time slice based on average disk rate by Vivek Goyal on 2009-07-02T20:05:04+00:00
- [PATCH 14/25] io-controller: Separate out queue and data by Vivek Goyal on 2009-07-02T20:05:05+00:00
- [PATCH 23/25] io-controller: Support per cgroup per device weights and io class by Vivek Goyal on 2009-07-02T20:05:37+00:00
- [PATCH 03/25] io-controller: bfq support of in-class preemption by Vivek Goyal on 2009-07-02T20:05:54+00:00
- [PATCH 22/25] io-controller: Per io group bdi congestion interface by Vivek Goyal on 2009-07-02T20:06:19+00:00
- [PATCH 10/25] io-controller: cfq changes to use hierarchical fair queuing code in elevaotor layer by Vivek Goyal on 2009-07-02T20:06:19+00:00
- [PATCH 16/25] io-controller: noop changes for hierarchical fair queuing by Vivek Goyal on 2009-07-02T20:06:51+00:00
- [PATCH 21/25] io-controller: Per cgroup request descriptor support by Vivek Goyal on 2009-07-02T20:06:51+00:00
- [PATCH 18/25] io-controller: anticipatory changes for hierarchical fair queuing by Vivek Goyal on 2009-07-02T20:07:22+00:00
- [PATCH 19/25] blkio_cgroup patches from Ryo to track async bios. by Vivek Goyal on 2009-07-02T20:07:22+00:00
- [PATCH 15/25] io-conroller: Prepare elevator layer for single queue schedulers by Vivek Goyal on 2009-07-02T20:07:55+00:00
- [PATCH 06/25] io-controller: Modify cfq to make use of flat elevator fair queuing by Vivek Goyal on 2009-07-02T20:08:20+00:00
- [PATCH 02/25] io-controller: Core of the B-WF2Q+ scheduler by Vivek Goyal on 2009-07-02T20:08:39+00:00
- [PATCH 13/25] io-controller: Wait for requests to complete from last queue before new queue is scheduled by Vivek Goyal on 2009-07-02T20:08:39+00:00
- [PATCH 24/25] io-controller: Debug hierarchical IO scheduling by Vivek Goyal on 2009-07-02T20:09:19+00:00
- [PATCH 08/25] io-controller: cgroup related changes for hierarchical group support by Vivek Goyal on 2009-07-02T20:09:19+00:00
- [PATCH 17/25] io-controller: deadline changes for hierarchical fair queuing by Vivek Goyal on 2009-07-02T20:09:45+00:00
- [PATCH 09/25] io-controller: Common hierarchical fair queuing code in elevaotor layer by Vivek Goyal on 2009-07-02T20:11:01+00:00
- [PATCH 11/25] io-controller: Export disk time used and nr sectors dipatched through cgroups by Vivek Goyal on 2009-07-02T20:11:08+00:00
- [PATCH 20/25] io-controller: map async requests to appropriate cgroup by Vivek Goyal on 2009-07-02T20:11:50+00:00
- Re: [PATCH 13/25] io-controller: Wait for requests to complete from last queue before new queue is scheduled by Nauman Rafique on 2009-07-02T20:12:01+00:00
- [PATCH 12/25] io-controller: idle for sometime on sync queue before expiring it by Vivek Goyal on 2009-07-02T20:12:18+00:00
- Re: [PATCH 09/25] io-controller: Common hierarchical fair queuing code in elevaotor layer by Gui Jianfeng on 2009-07-06T02:47:16+00:00
- Re: [PATCH 11/25] io-controller: Export disk time used and nr sectors dipatched through cgroups by Gui Jianfeng on 2009-07-08T02:17:54+00:00
- Re: [PATCH 21/25] io-controller: Per cgroup request descriptor support by Gui Jianfeng on 2009-07-08T03:28:26+00:00
- Re: [RFC] IO scheduler based IO controller V6 by Balbir Singh on 2009-07-08T03:56:43+00:00
- [PATCH] io-controller: implement per group request allocation limitation by Gui Jianfeng on 2009-07-10T01:57:27+00:00
- Re: [PATCH 21/25] io-controller: Per cgroup request descriptor support by Gui Jianfeng on 2009-07-21T05:39:05+00:00
- Re: [PATCH 21/25] io-controller: Per cgroup request descriptor support by Nauman Rafique on 2009-07-21T05:55:51+00:00
- Re: [RFC] IO scheduler based IO controller V6 by Gui Jianfeng on 2009-07-27T02:12:20+00:00
Related Threads
- could use help debugging black screen on RS780 (xf86-video-ati-6.13.0, radeon) - xorg
- jira - Created: (CAMEL-2777) Exchange get methods should not alter the object - apachecamel
- Gimp-developer - comparison of enlargements produced with two GIMP methods, three GEGL methods, and three candidate GEGL metho - gimp
- INDEX now builds successfully on 6.x - freebsd
- Re: Wine - Direct X not found?! - wine
- Anotated 'make test' log - wine
- users - httpd - Add new site - httpd
- smppbox installation message Nothing to be done for 'install-exec-am' - kannel
- Samba - minor BUG? re: smbstatus, 'rlimit max'=xxx -> msg=files on client; not server. - samba
- gccgo - A type assertion with nil does not succeed - gcc