Megatest

Check-in [2635b582e7]
Login
Overview
Comment:Gate test launch based on journal load. Values from load calc seem wrong. Should be 0-1.0 but seeing integers 0, 1, 2 ...
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | v1.81-journal-based-throttling
Files: files | file ages | folders
SHA1: 2635b582e7273323b216f9f1266066a0841abb04
User & Date: mrwellan on 2024-07-10 18:10:05
Other Links: branch diff | manifest | tags
Context
2024-07-10
20:11
Force values to be real in journal stats colletion. still broken though check-in: c906466bb0 user: matt tags: v1.81-journal-based-throttling
18:10
Gate test launch based on journal load. Values from load calc seem wrong. Should be 0-1.0 but seeing integers 0, 1, 2 ... check-in: 2635b582e7 user: mrwellan tags: v1.81-journal-based-throttling
17:44
Added journal based statical droop based throttling of queries. check-in: fc6b05f924 user: mrwellan tags: v1.81-journal-based-throttling
Changes

Modified runs.scm from [0cd899f860] to [dadc9aecb3].

1145
1146
1147
1148
1149
1150
1151
1152













1153
1154
1155
1156
1157
1158
1159
1145
1146
1147
1148
1149
1150
1151

1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171







-
+
+
+
+
+
+
+
+
+
+
+
+
+







	 (registry-mutex         (runs:dat-registry-mutex runsdat))
	 (flags                  (runs:dat-flags runsdat))
	 (keyvals                (runs:dat-keyvals runsdat))
	 (run-info               (runs:dat-run-info runsdat))
	 (all-tests-registry     (runs:dat-all-tests-registry runsdat))
	 (run-limits-info        (runs:dat-can-run-more-tests runsdat))
	 ;; (runs:can-run-more-tests run-id jobgroup max-concurrent-jobs)) ;; look at the test jobgroup and tot jobs running
	 (have-resources         (car run-limits-info))
	 (have-resources         (and (if *journal-stats*
					  (let* ((dbfname (conc
							   (dbfile:run-id->dbnum run-id)
							   ".db"))
						 (stats (tt:get-journal-stats))
						 (load  (or (alist-ref dbfname stats equal?) 0)))
					    (if (> load 0.1) ;; dbs too busy to start more tests
						(begin
						 (debug:print-info 0 *default-log-port* "Gating launch due to db load "load" based on journal file observations for "dbfname)
						 #f)
						#t))
					  #t) ;; if journal monitoring not started do not gate
				      (car run-limits-info)))
	 (num-running            (list-ref run-limits-info 1))
	 (num-running-in-jobgroup(list-ref run-limits-info 2)) 
	 (max-concurrent-jobs    (list-ref run-limits-info 3))
	 (job-group-limit        (list-ref run-limits-info 4))
	 ;; (prereqs-not-met        (rmt:get-prereqs-not-met run-id waitons hed item-path mode: testmode itemmaps: itemmaps))
	 ;; (prereqs-not-met         (mt:lazy-get-prereqs-not-met run-id waitons item-path mode: testmode itemmap: itemmap))
	 (fails                  (if (list? prereqs-not-met) ;; TODO: rename fails to failed-prereqs

Modified tcp-transportmod.scm from [98a778bd3e] to [8195ca9d01].

1194
1195
1196
1197
1198
1199
1200

1201
1202
1203
1204
1205
1206
1207
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208







+







    (tt:write-load-tracking dbdir)
    (mutex-unlock! *journal-stats-mutex*)
    (thread-sleep! (/ (random 1000) 100.0))
    (loop)))

;; call this to start a thread that is keeping the journal-stats up to date.
(define (tt:start-stats dbdir)
  
  (thread-start!
   (make-thread
    (lambda ()(tt:journal-stats-run dbdir)) "Journal stats collection thread")))

(define (tt:get-journal-stats)
  (let* ((result    (make-jstats))
	 (hitcounts (jstats-jcount result)))
1226
1227
1228
1229
1230
1231
1232
1233

1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1227
1228
1229
1230
1231
1232
1233

1234

1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247







-
+
-













	(debug:print 0 *default-log-port* "INFO: *journal-stats* not set."))
    ;; convert to normalized alist
    (let ((tot  (min (jstats-count result) 1)) ;; avoid divide by zero
	  (hits (jstats-jcount result))) ;; 1.db => count
      (hash-table-map
       hits
       (lambda (fname hitcount)
	 (cons fname (/ hitcount tot)))))
	 (cons fname (/ hitcount tot)))))))
    ))

;; megatest> (import tcp-transportmod)
;; megatest> (tt:write-load-tracking ".mtdb")
;; megatest> (hash-table-keys *journal-stats*)
;; (172060297)
;; megatest> (jstats->alist (hash-table-ref *journal-stats* 172060297))
;; ((count . 1) (jcount . #<hash-table (1)>))
;; megatest> (jstats-jcount (hash-table-ref *journal-stats* 172060297))
;; #<hash-table (1)>
;; megatest> (hash-table->alist (jstats-jcount (hash-table-ref *journal-stats* 172060297)))
;; (("1.db" . 4))

)