Megatest

Check-in [2635b582e7]
Login
Overview
Comment:Gate test launch based on journal load. Values from load calc seem wrong. Should be 0-1.0 but seeing integers 0, 1, 2 ...
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | v1.81-journal-based-throttling
Files: files | file ages | folders
SHA1: 2635b582e7273323b216f9f1266066a0841abb04
User & Date: mrwellan on 2024-07-10 18:10:05
Other Links: branch diff | manifest | tags
Context
2024-07-10
20:11
Force values to be real in journal stats colletion. still broken though check-in: c906466bb0 user: matt tags: v1.81-journal-based-throttling
18:10
Gate test launch based on journal load. Values from load calc seem wrong. Should be 0-1.0 but seeing integers 0, 1, 2 ... check-in: 2635b582e7 user: mrwellan tags: v1.81-journal-based-throttling
17:44
Added journal based statical droop based throttling of queries. check-in: fc6b05f924 user: mrwellan tags: v1.81-journal-based-throttling
Changes

Modified runs.scm from [0cd899f860] to [dadc9aecb3].

1145
1146
1147
1148
1149
1150
1151
1152












1153
1154
1155
1156
1157
1158
1159
	 (registry-mutex         (runs:dat-registry-mutex runsdat))
	 (flags                  (runs:dat-flags runsdat))
	 (keyvals                (runs:dat-keyvals runsdat))
	 (run-info               (runs:dat-run-info runsdat))
	 (all-tests-registry     (runs:dat-all-tests-registry runsdat))
	 (run-limits-info        (runs:dat-can-run-more-tests runsdat))
	 ;; (runs:can-run-more-tests run-id jobgroup max-concurrent-jobs)) ;; look at the test jobgroup and tot jobs running
	 (have-resources         (car run-limits-info))












	 (num-running            (list-ref run-limits-info 1))
	 (num-running-in-jobgroup(list-ref run-limits-info 2)) 
	 (max-concurrent-jobs    (list-ref run-limits-info 3))
	 (job-group-limit        (list-ref run-limits-info 4))
	 ;; (prereqs-not-met        (rmt:get-prereqs-not-met run-id waitons hed item-path mode: testmode itemmaps: itemmaps))
	 ;; (prereqs-not-met         (mt:lazy-get-prereqs-not-met run-id waitons item-path mode: testmode itemmap: itemmap))
	 (fails                  (if (list? prereqs-not-met) ;; TODO: rename fails to failed-prereqs







|
>
>
>
>
>
>
>
>
>
>
>
>







1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
	 (registry-mutex         (runs:dat-registry-mutex runsdat))
	 (flags                  (runs:dat-flags runsdat))
	 (keyvals                (runs:dat-keyvals runsdat))
	 (run-info               (runs:dat-run-info runsdat))
	 (all-tests-registry     (runs:dat-all-tests-registry runsdat))
	 (run-limits-info        (runs:dat-can-run-more-tests runsdat))
	 ;; (runs:can-run-more-tests run-id jobgroup max-concurrent-jobs)) ;; look at the test jobgroup and tot jobs running
	 (have-resources         (and (if *journal-stats*
					  (let* ((dbfname (conc
							   (dbfile:run-id->dbnum run-id)
							   ".db"))
						 (stats (tt:get-journal-stats))
						 (load  (or (alist-ref dbfname stats equal?) 0)))
					    (if (> load 0.1) ;; dbs too busy to start more tests
						(begin
						 (debug:print-info 0 *default-log-port* "Gating launch due to db load "load" based on journal file observations for "dbfname)
						 #f)
						#t))
					  #t) ;; if journal monitoring not started do not gate
				      (car run-limits-info)))
	 (num-running            (list-ref run-limits-info 1))
	 (num-running-in-jobgroup(list-ref run-limits-info 2)) 
	 (max-concurrent-jobs    (list-ref run-limits-info 3))
	 (job-group-limit        (list-ref run-limits-info 4))
	 ;; (prereqs-not-met        (rmt:get-prereqs-not-met run-id waitons hed item-path mode: testmode itemmaps: itemmaps))
	 ;; (prereqs-not-met         (mt:lazy-get-prereqs-not-met run-id waitons item-path mode: testmode itemmap: itemmap))
	 (fails                  (if (list? prereqs-not-met) ;; TODO: rename fails to failed-prereqs

Modified tcp-transportmod.scm from [98a778bd3e] to [8195ca9d01].

1194
1195
1196
1197
1198
1199
1200

1201
1202
1203
1204
1205
1206
1207
    (tt:write-load-tracking dbdir)
    (mutex-unlock! *journal-stats-mutex*)
    (thread-sleep! (/ (random 1000) 100.0))
    (loop)))

;; call this to start a thread that is keeping the journal-stats up to date.
(define (tt:start-stats dbdir)

  (thread-start!
   (make-thread
    (lambda ()(tt:journal-stats-run dbdir)) "Journal stats collection thread")))

(define (tt:get-journal-stats)
  (let* ((result    (make-jstats))
	 (hitcounts (jstats-jcount result)))







>







1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
    (tt:write-load-tracking dbdir)
    (mutex-unlock! *journal-stats-mutex*)
    (thread-sleep! (/ (random 1000) 100.0))
    (loop)))

;; call this to start a thread that is keeping the journal-stats up to date.
(define (tt:start-stats dbdir)
  
  (thread-start!
   (make-thread
    (lambda ()(tt:journal-stats-run dbdir)) "Journal stats collection thread")))

(define (tt:get-journal-stats)
  (let* ((result    (make-jstats))
	 (hitcounts (jstats-jcount result)))
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
	(debug:print 0 *default-log-port* "INFO: *journal-stats* not set."))
    ;; convert to normalized alist
    (let ((tot  (min (jstats-count result) 1)) ;; avoid divide by zero
	  (hits (jstats-jcount result))) ;; 1.db => count
      (hash-table-map
       hits
       (lambda (fname hitcount)
	 (cons fname (/ hitcount tot)))))
    ))

;; megatest> (import tcp-transportmod)
;; megatest> (tt:write-load-tracking ".mtdb")
;; megatest> (hash-table-keys *journal-stats*)
;; (172060297)
;; megatest> (jstats->alist (hash-table-ref *journal-stats* 172060297))
;; ((count . 1) (jcount . #<hash-table (1)>))
;; megatest> (jstats-jcount (hash-table-ref *journal-stats* 172060297))
;; #<hash-table (1)>
;; megatest> (hash-table->alist (jstats-jcount (hash-table-ref *journal-stats* 172060297)))
;; (("1.db" . 4))

)







|
<













1227
1228
1229
1230
1231
1232
1233
1234

1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
	(debug:print 0 *default-log-port* "INFO: *journal-stats* not set."))
    ;; convert to normalized alist
    (let ((tot  (min (jstats-count result) 1)) ;; avoid divide by zero
	  (hits (jstats-jcount result))) ;; 1.db => count
      (hash-table-map
       hits
       (lambda (fname hitcount)
	 (cons fname (/ hitcount tot)))))))


;; megatest> (import tcp-transportmod)
;; megatest> (tt:write-load-tracking ".mtdb")
;; megatest> (hash-table-keys *journal-stats*)
;; (172060297)
;; megatest> (jstats->alist (hash-table-ref *journal-stats* 172060297))
;; ((count . 1) (jcount . #<hash-table (1)>))
;; megatest> (jstats-jcount (hash-table-ref *journal-stats* 172060297))
;; #<hash-table (1)>
;; megatest> (hash-table->alist (jstats-jcount (hash-table-ref *journal-stats* 172060297)))
;; (("1.db" . 4))

)