Megatest

Changes On Branch v1.65-lazyqueue-items-rollup
Login

Changes In Branch v1.65-lazyqueue-items-rollup Excluding Merge-Ins

This is equivalent to a diff from 2769e4b7c9 to d5bb9fca6e

2021-03-01
17:42
Manually patched in the new view check-in: f5206150ee user: mrwellan tags: v1.6569-new-view
2021-02-25
22:05
remove-mutex From: 022801133154585551c60feee4baf6758b91f993 User: matt Closed-Leaf check-in: d5bb9fca6e user: matt tags: v1.65-lazyqueue-items-rollup
22:05
add finalize of no-sync and re-enable the mutex From: 1775254e3f45db753fe942e0f142c3ec732dbcb6 User: matt check-in: d616d48e39 user: matt tags: v1.65-lazyqueue-items-rollup
2021-01-26
14:00
Fix for the > crash. Maybe... Leaf check-in: 5a05fc04ff user: matt tags: v1.6569-gt-crash-fix
2021-01-25
12:03
rebased lazy-queue rollup check-in: 07ab120544 user: matt tags: v1.65-lazyqueue-items-rollup
2021-01-15
22:46
begin diet check-in: badd71f3b3 user: matt tags: v1.6569-diet
21:34
eval-string-in-environment if was disabled, re-enabled check-in: 9564772564 user: matt tags: v1.6569-reenable-eval-if
2021-01-08
11:42
enable custom value for max delay between archive time and test last update time Leaf check-in: 86a3d1148e user: pjhatwal tags: v1.6569-refactor
2020-11-25
12:00
Fixed issues in server gating code Leaf check-in: 063273e8cb user: mrwellan tags: v1.6569-server-gate-fix
2020-11-24
22:27
Added support for resetting run - allows to reload tests-paths to add tests to a run part way though. Just run megatest -clean-cache -runname $MT_RUNNAME Leaf check-in: 213021e02d user: mrwellan tags: v1.6596-reload-tests-paths
2020-10-13
16:46
Changed version from 69 to 76. No other changes. Will compile with chicken 13 check-in: 87ca35010f user: mmgraham tags: v1.65, v1.6576
2020-10-12
16:49
Reduced message from failed to info. Reverted a delay which seems to help pass full stack ext-tests. Leaf check-in: 9e35b1252c user: mrwellan tags: v1.65-minor-patch
10:18
Safe vector access in rmt. check-in: 58bb6d997a user: mrwellan tags: v1.65-side2
2020-10-11
22:46
Patched forward adjutant code. check-in: f936717bfa user: matt tags: v1.65-adjutant-again
2020-10-05
22:49
Do not exit on failure to create directory - race conditons on NFS cause false fail scenarios - just keep going and cross your fingers... (cherrypicked from v1.6572) check-in: 05b253a452 user: matt tags: v1.65-sidework
22:46
run duration testdat check-in: 4a0b43f3c6 user: matt tags: v1.65-test-rundat2
2020-09-21
15:36
merged in 1.65-test-rundat branch ==/FAIL/orion,mars/== check-in: cfd25d66e9 user: mmgraham tags: v1.6571, v1.65-failed-testdat
07:00
Added get-testsuite-name all over launch:setup and still not set when needed! This did NOT work. Closed-Leaf check-in: 2efe8ad422 user: mrwellan tags: v1.65-get-testsuitename
2020-09-19
04:21
Start moving test_rundat to no-sync db. ==/20/2/WARN/1203/mars/== check-in: abfabdb839 user: matt tags: v1.65-test-rundat
2020-09-18
17:30
added check for file existence before file delete ==/14/1.9/WARN/orion,mars/== NOTE: This is the last v1.65 before the split off. I.e code from before this point IS in the far future v1.65 branch. Code from this point to that branch might NOT be in the branch. check-in: 2769e4b7c9 user: mmgraham tags: v1.65, v1.6569
12:27
cherry picked 2 fixes, changed version to 1.6569 ==/7.2/2.0/PASS/1201/mars/== check-in: d145d0eb02 user: mmgraham tags: v1.65

Modified common.scm from [33c7316880] to [775d33a7fc].

1073
1074
1075
1076
1077
1078
1079





1080
1081
1082
1083
1084
1085
1086
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091







+
+
+
+
+







				  (let ((db (cdr *task-db*)))
				    (if (sqlite3:database? db)
					(begin
					  (sqlite3:interrupt! db)
					  (sqlite3:finalize! db #t)
					  ;; (vector-set! *task-db* 0 #f)
					  (set! *task-db* #f)))))
			      (if (and *no-sync-db*
				       (sqlite3:database? *no-sync-db*))
				  (begin
				    (sqlite3:interrupt! *no-sync-db*)
				    (sqlite3:finalize! *no-sync-db* #t)))
                              (http-client#close-all-connections!)
                              ;; (if (and *runremote*
                              ;;          (remote-conndat *runremote*))
                              ;;     (begin
                              ;;       (http-client#close-all-connections!))) ;; for http-client
                              (if (not (eq? *default-log-port* (current-error-port)))
                                  (close-output-port *default-log-port*))

Modified db.scm from [fb3a18f52f] to [e5323ac36a].

2160
2161
2162
2163
2164
2165
2166

2167



2168
2169

2170



2171
2172
2173
2174
2175
2176
2177
2160
2161
2162
2163
2164
2165
2166
2167

2168
2169
2170
2171
2172
2173

2174
2175
2176
2177
2178
2179
2180
2181
2182
2183







+
-
+
+
+


+
-
+
+
+







                 (let ((db (db:open-no-sync-db)))
                   (set! *no-sync-db* db)
                   db))))
    (mutex-unlock! *db-access-mutex*)
    res))

(define (db:no-sync-set db var val)
  ;; (mutex-lock! *db-access-mutex*)
  (sqlite3:execute (db:no-sync-db db) "INSERT OR REPLACE INTO no_sync_metadat (var,val) VALUES (?,?);" var val))
  (let ((res (sqlite3:execute (db:no-sync-db db) "INSERT OR REPLACE INTO no_sync_metadat (var,val) VALUES (?,?);" var val)))
    ;; (mutex-unlock! *db-access-mutex*)
    res))

(define (db:no-sync-del! db var)
  ;; (mutex-lock! *db-access-mutex*)
  (sqlite3:execute (db:no-sync-db db) "DELETE FROM no_sync_metadat WHERE var=?;" var))
  (sqlite3:execute (db:no-sync-db db) "DELETE FROM no_sync_metadat WHERE var=?;" var)
  ;; (mutex-unlock! *db-access-mutex*)
  )

(define (db:no-sync-get/default db var default)
  (let ((res default))
    (sqlite3:for-each-row
     (lambda (val)
       (set! res val))
     (db:no-sync-db db)
3213
3214
3215
3216
3217
3218
3219
3220
3221
3222
3223




3224
3225
3226
3227
3228
3229
3230
3219
3220
3221
3222
3223
3224
3225




3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236







-
-
-
-
+
+
+
+







       (if newstatus  (sqlite3:execute db "UPDATE tests SET status=?  WHERE id=?;" newstatus  test-id))
       (if newcomment (sqlite3:execute db "UPDATE tests SET comment=? WHERE id=?;" newcomment ;; (sdb:qry 'getid newcomment)
				       test-id))))))
  (mt:process-triggers dbstruct run-id test-id newstate newstatus))

;; NEW BEHAVIOR: Count tests running in all runs!
;;
(define (db:get-count-tests-running dbstruct run-id fastmode)
  (let* ((qry (if fastmode
		  "SELECT count(id) FROM tests WHERE state in ('RUNNING','LAUNCHED','REMOTEHOSTSTART') AND NOT (uname = 'n/a' AND item_path = '') LIMIT 1;"
		  "SELECT count(id) FROM tests WHERE state in ('RUNNING','LAUNCHED','REMOTEHOSTSTART') AND NOT (uname = 'n/a' AND item_path = '');")))
(define (db:get-count-tests-running dbstruct run-id) ;; fastmode)
  (let* ((qry ;; (if fastmode
		;;   "SELECT count(id) FROM tests WHERE state in ('RUNNING','LAUNCHED','REMOTEHOSTSTART') AND NOT (uname = 'n/a' AND item_path = '') LIMIT 1;"
		  "SELECT count(id) FROM tests WHERE state in ('RUNNING','LAUNCHED','REMOTEHOSTSTART') AND NOT (uname = 'n/a' AND item_path = '');")) ;; )
  (db:with-db
   dbstruct
   run-id
   #f
   (lambda (db)
     (let* ((stmth (db:get-cache-stmth dbstruct  db qry)))
       (sqlite3:first-result stmth))))))
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253




3254
3255
3256
3257
3258
3259
3260
3249
3250
3251
3252
3253
3254
3255




3256
3257
3258
3259
3260
3261
3262
3263
3264
3265
3266







-
-
-
-
+
+
+
+







      ;; "SELECT count(id) FROM tests WHERE state in ('RUNNING','LAUNCHED','REMOTEHOSTSTART') AND run_id NOT IN (SELECT id FROM runs WHERE state='deleted') AND NOT (uname = 'n/a' AND item_path = '');")
      "SELECT count(id) FROM tests WHERE state in ('RUNNING','REMOTEHOSTSTART','LAUNCHED') AND run_id=?;" 
      run-id)))) ;; NOT IN (SELECT id FROM runs WHERE state='deleted');")

;; NEW BEHAVIOR: Look only at single run with run-id
;; 
;; (define (db:get-running-stats dbstruct run-id)
(define (db:get-count-tests-running-for-run-id dbstruct run-id fastmode)
  (let* ((qry (if fastmode
		  "SELECT count(id) FROM tests WHERE state in ('RUNNING','LAUNCHED','REMOTEHOSTSTART') AND run_id=? LIMIT 1;"
		  "SELECT count(id) FROM tests WHERE state in ('RUNNING','LAUNCHED','REMOTEHOSTSTART') AND run_id=?;")))
(define (db:get-count-tests-running-for-run-id dbstruct run-id) ;; fastmode)
  (let* ((qry ;; (if fastmode
		 ;;  "SELECT count(id) FROM tests WHERE state in ('RUNNING','LAUNCHED','REMOTEHOSTSTART') AND run_id=? LIMIT 1;"
		  "SELECT count(id) FROM tests WHERE state in ('RUNNING','LAUNCHED','REMOTEHOSTSTART') AND run_id=?;")) ;; )
    (db:with-db
     dbstruct
     run-id
     #f
     (lambda (db)
       (let* ((stmth (db:get-cache-stmth dbstruct db qry)))
	 (sqlite3:first-result stmth run-id))))))
3966
3967
3968
3969
3970
3971
3972
3973






















































































































3974
3975
3976
3977
3978
3979

3980
3981
3982
3983
3984
3985
3986
3972
3973
3974
3975
3976
3977
3978
3979
3980
3981
3982
3983
3984
3985
3986
3987
3988
3989
3990
3991
3992
3993
3994
3995
3996
3997
3998
3999
4000
4001
4002
4003
4004
4005
4006
4007
4008
4009
4010
4011
4012
4013
4014
4015
4016
4017
4018
4019
4020
4021
4022
4023
4024
4025
4026
4027
4028
4029
4030
4031
4032
4033
4034
4035
4036
4037
4038
4039
4040
4041
4042
4043
4044
4045
4046
4047
4048
4049
4050
4051
4052
4053
4054
4055
4056
4057
4058
4059
4060
4061
4062
4063
4064
4065
4066
4067
4068
4069
4070
4071
4072
4073
4074
4075
4076
4077
4078
4079
4080
4081
4082
4083
4084
4085
4086
4087
4088
4089
4090
4091
4092
4093
4094
4095
4096
4097
4098
4099
4100
4101
4102

4103
4104
4105
4106
4107
4108
4109
4110








+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+





-
+







;;    ;; 	(db:general-call dbdat 'state-status-msg (list state status msg test-id))
;;    ;; 	(db:general-call dbdat 'state-status     (list state status test-id)))
;;    (db:set-state-status-and-roll-up-items dbstruct run-id test-id #f state status msg)
;;    ;; process the test_data table
;;    (if (and test-id state status (equal? status "AUTO")) 
;; 	(db:test-data-rollup dbstruct run-id test-id status))
;;    (mt:process-triggers dbstruct run-id test-id state status)))

;; NOT FINISHED 
(define (db:calc-state-status-toplevel state status tl-state tl-status)
  `(,state ,status))

;;   (match state
;;     (("COMPLETED")
;;      (match `(,tl-state ,tl-status)
;;        (("COMPLETED" "PASS") `(,state ,status))
;;        (("COMPLETED" thestatus)
;; 	(case (string->symbol thestatus)
;; 	  ((ABORT CHECK DEAD)
;; 	   (if `("COMPLETED" ,thestatus))
;; 	(match `(,thestatus ,status)
;; 	  (("FAIL" "ABORT") '("COMPLETED" "ABORT"))
;; 	  (("FAIL" "CHECK") '("COMPLETED" "CHECK"))
;; 	  (("FAIL" "DEAD")  '("COMPLETED" "DEAD"))
;; 	  (("WARN" "FAIL")  '("COMPLETED" "FAIL"))
;; 	  (("WARN" "CHECK") '("COMPLETED" "CHECK"))
;; 	  (("WARN" "DEAD")
       
(define (db:set-state-status-and-roll-up-items dbstruct run-id test-name item-path state status comment)
  ;; establish info on incoming test followed by info on top level test
  ;; BBnote - for mode itemwait, linkage between upstream test & matching item status is propagated to run queue in db:prereqs-not-met
  ;; (mutex-lock! *db-transaction-mutex*) ;; why do we need a mutex?
  (let* ((testdat      (if (number? test-name)
			   (db:get-test-info-by-id dbstruct run-id test-name) ;; test-name is actually a test-id
			   (db:get-test-info       dbstruct run-id test-name item-path)))
	 (test-id      (db:test-get-id testdat))
	 (test-name    (if (number? test-name)
			   (db:test-get-testname testdat)
			   test-name))
	 (no-sync-db   (db:no-sync-db #f))
	 (rollup-flag  #f)
	 (wait-flag    #f)
	 (rollup-lock-key  (conc run-id "-rollup-" test-name))
	 (waiting-lock-key (conc run-id "-waiting-" test-name)))
    (db:test-set-state-status dbstruct run-id test-id state status #f)
    (if (and test-id state status (equal? status "AUTO")) 
	(db:test-data-rollup dbstruct run-id test-id status))
    (if (member state '("LAUNCHED" "REMOTEHOSTSTART"))
	(db:general-call dbstruct 'set-test-start-time (list test-id)))
    (if (not (equal? item-path "")) ;; only roll up IF incoming test is an item
	(begin
	  ;; is there a rollup lock? If not, take it
	  (sqlite3:with-transaction
	   no-sync-db
	   (lambda ()
	     ;; (debug:print 0 *default-log-port* "EXCEPTION: exn="exn)
	     (let* ((rollup-lock-time (db:no-sync-get/default no-sync-db rollup-lock-key #f))
		    (waiting-lock-time (db:no-sync-get/default no-sync-db waiting-lock-key #f)))
	       (if rollup-lock-time ;; someone is doing a rollup
		   (if (not waiting-lock-time) ;; no one is waiting
		       (begin
			 (set! wait-flag #t)
			 (set! rollup-flag #t)
			 (db:no-sync-set no-sync-db waiting-lock-key (current-seconds)))) ;; we are going to wait
		   (begin
		     (set! rollup-flag #t)
		     (db:no-sync-set no-sync-db rollup-lock-key (current-seconds)))))))
	  (if wait-flag
	      (let loop ((count 10)) ;; about 20 seconds
		(thread-sleep! 2)
		(if (and (not (db:no-sync-get/default no-sync-db waiting-lock-key #f))
			 (> count 0))
		    (loop (+ count 1))
		    (sqlite3:with-transaction
		     no-sync-db
		     (lambda ()
		       (db:no-sync-set no-sync-db rollup-lock-key (current-seconds))
		       (db:no-sync-del! no-sync-db waiting-lock-key))))))
	  ;; now the rollup
	  ;; (mutex-unlock! *db-transaction-mutex*) ;; why do we need a mutex?
	  (if rollup-flag ;; put this into a thread
	      (begin
		;; (thread-start! (make-thread
		;; 	      (lambda ()
		(db:roll-up-test-state-status dbstruct run-id test-name state status)
		(db:no-sync-del! no-sync-db rollup-lock-key))
	      ;; (conc "thread for run-id: " run-id " test-name: " test-name))))))))
	      ))
	;; (mutex-unlock! *db-transaction-mutex*) ;; why do we need a mutex?
	)))
	      
;; I'd like to remove the need for item-path - it is logically not needed here
;; for now we pass in state and status - NOTE: There is a possible race if a test
;; is rapidly re-run while an earlier run is waiting to rollup.
;;
(define (db:roll-up-test-state-status dbstruct run-id test-name state status)
  (let* ((testdat      (if (number? test-name)
			   (db:get-test-info-by-id dbstruct run-id test-name) ;; test-name is actually a test-id
			   (db:get-test-info       dbstruct run-id test-name "")))
	 (test-id      (db:test-get-id testdat))
	 (test-name    (if (number? test-name)
			   (db:test-get-testname testdat)
			   test-name))
	 (tl-testdat   (db:get-test-info dbstruct run-id test-name ""))
	 (tl-test-id   (db:test-get-id tl-testdat)))
    (db:with-db
     dbstruct #f #f
     (lambda (db)
       ;; NB// Pass the db so it is part fo the transaction
       ;; item-path is used in get-all-state-status counts to exclude current state/status of THIS test
       ;; but with the state/status being set earlier this is not needed any longer
       (let* ((state-status-counts  (db:get-all-state-status-counts-for-testname dbstruct run-id test-name))
	      (state-statuses       (if (null? state-status-counts)
					'()
					(db:roll-up-rules state-status-counts state status)))
	      (newstate             (if (null? state-statuses)
					state
					(car state-statuses)))
	      (newstatus            (if (null? state-statuses)
					status
					(cadr state-statuses))))
	 (if tl-test-id
	     (db:test-set-state-status db run-id tl-test-id newstate newstatus #f))
	 )))
    #t))

;; state is the priority rollup of all states
;; status is the priority rollup of all completed statesfu
;;
;; if test-name is an integer work off that instead of test-name test-path
;;
(define (db:set-state-status-and-roll-up-items dbstruct run-id test-name item-path state status comment)
(define (db:set-state-status-and-roll-up-items-orig dbstruct run-id test-name item-path state status comment)
  ;; establish info on incoming test followed by info on top level test
  ;; BBnote - for mode itemwait, linkage between upstream test & matching item status is propagated to run queue in db:prereqs-not-met
  (let* ((testdat      (if (number? test-name)
			   (db:get-test-info-by-id dbstruct run-id test-name) ;; test-name is actually a test-id
			   (db:get-test-info       dbstruct run-id test-name item-path)))
	 (test-id      (db:test-get-id testdat))
	 (test-name    (if (number? test-name)
3997
3998
3999
4000
4001
4002
4003
4004

4005
4006
4007
4008
4009
4010



4011
4012
4013
4014
4015
4016
4017
4018
4019
4020
4021
4022
4023
4024
4025
4026
4027
4028
4029
4030
4031
4032
4033
4034
4035
4036
4037
4038
4039
4040
4041
4042
4043
4044
4045
4046
4047
4048
4049
4050
4051
4052
4053
4054
4055
4056
4057
4058
4059
4060
4061
4062
4063
4064
4065
4066
4067
4068
4069
4070
4071
4072
4073















































4074
4075
4076
4077
4078
4079
4080


4081
4082
4083

4084
4085
4086
4087
4088
4089
4090
4091
4092
4093
4094







4095
4096
4097

4098
4099
4100
4101
4102
4103
4104
4121
4122
4123
4124
4125
4126
4127

4128
4129
4130
4131



4132
4133
4134
4135
4136
4137
4138
4139
4140
4141
4142
4143
4144
4145
4146
4147
4148
4149
4150















































4151
4152
4153
4154
4155
4156
4157
4158
4159
4160
4161
4162
4163
4164
4165
4166
4167
4168
4169
4170
4171
4172
4173
4174
4175
4176
4177
4178
4179
4180
4181
4182
4183
4184
4185
4186
4187
4188
4189
4190
4191
4192
4193
4194
4195
4196
4197
4198
4199
4200
4201



4202
4203
4204
4205

4206
4207
4208
4209
4210
4211
4212
4213
4214



4215
4216
4217
4218
4219
4220
4221
4222
4223

4224
4225
4226
4227
4228
4229
4230
4231







-
+



-
-
-
+
+
+
















-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+




-
-
-
+
+


-
+








-
-
-
+
+
+
+
+
+
+


-
+







    (db:with-db
     dbstruct #f #f
     (lambda (db)
       (let ((tr-res
              (sqlite3:with-transaction
               db
               (lambda ()
                 ;; NB// Pass the db so it is part fo the transaction
                 ;; NB// Pass the db so it is part of the transaction
                 (db:test-set-state-status db run-id test-id state status comment) ;; this call sets the item state/status
                 (if (not (equal? item-path "")) ;; only roll up IF incoming test is an item
                     (let* ((state-status-counts  (db:get-all-state-status-counts-for-test dbstruct run-id test-name item-path state status)) ;; item-path is used to exclude current state/status of THIS test
			      						  (state-stauses (db:roll-up-rules state-status-counts state status))
                          (newstate (car state-stauses))
                          (newstatus (cadr state-stauses)))
			    (state-statuses        (db:roll-up-rules state-status-counts state status))
                          (newstate  (car state-statuses))
                          (newstatus (cadr state-statuses)))
                       (debug:print 4 *default-log-port* "BB> tl-test-id="tl-test-id" ; "test-name":"item-path" newstate="newstate" newstatus="newstatus" len(sscs)="(length state-status-counts)  " state-status-counts: "
							(apply conc
                  (map (lambda (x)
                     (conc
                     		(with-output-to-string (lambda () (pp (dbr:counts->alist x)))) " | "))
                              state-status-counts))); end debug:print
   
                       (if tl-test-id
			   (db:test-set-state-status db run-id tl-test-id newstate newstatus #f)) ;; we are still in the transaction - must access the db and not the dbstruct
		       ))))))
         (mutex-unlock! *db-transaction-mutex*)
         (if (and test-id state status (equal? status "AUTO")) 
             (db:test-data-rollup dbstruct run-id test-id status))
         tr-res)))))

(define (db:roll-up-rules state-status-counts state status)
		(let* ((running     (length (filter (lambda (x)
                          (member (dbr:counts-state x) *common:running-states*))
                                 state-status-counts)))
           (bad-not-started      (length (filter (lambda (x)
                                      (and (equal? (dbr:counts-state x) "NOT_STARTED") 
                                        (not (member (dbr:counts-status x)  *common:not-started-ok-statuses*))))
																	state-status-counts)))
           (all-curr-states      (common:special-sort  ;; worst -> best (sort of)
                                    (delete-duplicates
                                      (if (and state (not (member state *common:dont-roll-up-states*)))
                                          (cons state (map dbr:counts-state state-status-counts))
                                          (map dbr:counts-state state-status-counts)))
                                                  *common:std-states* >))
           (all-curr-statuses    (common:special-sort  ;; worst -> best
                                    (delete-duplicates
                                      (if (and state status (not (member state *common:dont-roll-up-states*)))
                                          (cons status (map dbr:counts-status state-status-counts))
                                          (map dbr:counts-status state-status-counts)))
                                                   *common:std-statuses* >))
           (non-completes        (filter (lambda (x)
							 										 (not (member x (cons "COMPLETED" *common:dont-roll-up-states*))))
						       									all-curr-states))
			     (preq-fails        (filter (lambda (x)
							 								(equal? x "PREQ_FAIL"))
						       							all-curr-statuses))
           (num-non-completes (length non-completes))
 					 (newstate          (cond
															((> running 0)           "RUNNING")            ;; anything running, call the situation running
                              ((> (length preq-fails) 0) "NOT_STARTED")
															((> bad-not-started 0)   "COMPLETED")          ;; we have an ugly situation, it is completed in the sense we cannot do more.
															((> num-non-completes 0) (car non-completes))  ;;  (remove (lambda (x)(equal? "COMPLETED" x)) all-curr-states))) ;; only rollup DELETED if all DELETED
															(else                    (car all-curr-states))))
           (newstatus         (cond
                              ((> (length preq-fails) 0)  "PREQ_FAIL")
                              ((or (> bad-not-started 0)
                                   (and (equal? newstate "NOT_STARTED")
                                      (> num-non-completes 0)))
                                            "STARTED")
                              (else (car all-curr-statuses)))))
 					(debug:print-info 2 *default-log-port*
                                         "\n--> probe db:set-state-status-and-roll-up-items: "
                                         "\n--> state-status-counts: "(map dbr:counts->alist state-status-counts)
                                         "\n--> running:             "running
                                         "\n--> bad-not-started:     "bad-not-started
                                         "\n--> non-non-completes:   "num-non-completes
                                         "\n--> non-completes:       "non-completes
                                         "\n--> all-curr-states:     "all-curr-states
  (let* ((running              (length (filter (lambda (x)
						 (member (dbr:counts-state x) *common:running-states*))
					       state-status-counts)))
	 (bad-not-started      (length (filter (lambda (x)
						 (and (equal? (dbr:counts-state x) "NOT_STARTED") 
						      (not (member (dbr:counts-status x)  *common:not-started-ok-statuses*))))
					       state-status-counts)))
	 (all-curr-states      (common:special-sort  ;; worst -> best (sort of)
				(delete-duplicates
				 (if (and state (not (member state *common:dont-roll-up-states*)))
				     (cons state (map dbr:counts-state state-status-counts))
				     (map dbr:counts-state state-status-counts)))
				*common:std-states* >))
	 (all-curr-statuses    (common:special-sort  ;; worst -> best
				(delete-duplicates
				 (if (and state status (not (member state *common:dont-roll-up-states*)))
				     (cons status (map dbr:counts-status state-status-counts))
				     (map dbr:counts-status state-status-counts)))
				*common:std-statuses* >))
	 (non-completes        (filter (lambda (x)
					 (not (member x (cons "COMPLETED" *common:dont-roll-up-states*))))
				       all-curr-states))
	 (preq-fails        (filter (lambda (x)
				      (equal? x "PREQ_FAIL"))
				    all-curr-statuses))
	 (num-non-completes (length non-completes))
	 (newstate          (cond
			     ((> running 0)             "RUNNING")            ;; anything running, call the situation running
			     ((> (length preq-fails) 0) "NOT_STARTED")
			     ((> bad-not-started 0)     "COMPLETED")          ;; we have an ugly situation, it is completed in the sense we cannot do more.
			     ((> num-non-completes 0)   (car non-completes))  ;;  (remove (lambda (x)(equal? "COMPLETED" x)) all-curr-states))) ;; only rollup DELETED if all DELETED
			     (else                      (car all-curr-states))))
	 (newstatus         (cond
			     ((> (length preq-fails) 0)  "PREQ_FAIL")
			     ((or (> bad-not-started 0)
				  (and (equal? newstate "NOT_STARTED")
				       (> num-non-completes 0)))
			      "STARTED")
			     (else (car all-curr-statuses)))))
    (debug:print-info 2 *default-log-port*
		      "\n--> probe db:set-state-status-and-roll-up-items: "
		      "\n--> state-status-counts: "(map dbr:counts->alist state-status-counts)
		      "\n--> running:             "running
		      "\n--> bad-not-started:     "bad-not-started
		      "\n--> non-non-completes:   "num-non-completes
		      "\n--> non-completes:       "non-completes
		      "\n--> all-curr-states:     "all-curr-states
                                         "\n--> all-curr-statuses:     "all-curr-statuses
                                         "\n--> newstate              "newstate
                                         "\n--> newstatus            "newstatus
                                         "\n\n")

                        ;; NB// Pass the db so it is part of the transaction
         (list newstate newstatus)))
    ;; NB// Pass the db so it is part of the transaction
    (list newstate newstatus)))

(define (db:set-state-status-and-roll-up-run dbstruct run-id curr-state curr-status)
    (mutex-lock! *db-transaction-mutex*)
    ;; (mutex-lock! *db-transaction-mutex*)
    (db:with-db
     dbstruct #f #f
     (lambda (db)
       (let ((tr-res
              (sqlite3:with-transaction
               db
               (lambda ()
                   (let* ((state-status-counts  (db:get-all-state-status-counts-for-run dbstruct run-id))
													(state-stauses (db:roll-up-rules state-status-counts #f #f ))
                          (newstate (car state-stauses))
                          (newstatus (cadr state-stauses))) 
			  (state-statuses        (db:roll-up-rules state-status-counts #f #f ))
                          (newstate             (if (null? state-statuses)
						    curr-state
						    (car state-statuses)))
			  (newstatus            (if (null? state-statuses)
						    curr-status
						    (cadr state-statuses))))
                    (if (or (not (eq? newstate curr-state)) (not (eq?  newstatus curr-status)))
                   (db:set-run-state-status dbstruct run-id newstate newstatus )))))))
         (mutex-unlock! *db-transaction-mutex*)
         ;; (mutex-unlock! *db-transaction-mutex*)
         tr-res))))


(define (db:get-all-state-status-counts-for-run dbstruct run-id)
 (let* ((test-count-recs (db:with-db
                                  dbstruct #f #f
                                  (lambda (db)
4146
4147
4148
4149
4150
4151
4152






























4153
4154
4155
4156
4157
4158
4159
4273
4274
4275
4276
4277
4278
4279
4280
4281
4282
4283
4284
4285
4286
4287
4288
4289
4290
4291
4292
4293
4294
4295
4296
4297
4298
4299
4300
4301
4302
4303
4304
4305
4306
4307
4308
4309
4310
4311
4312
4313
4314
4315
4316







+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+








         (nonmatch-countrec-lambda (lambda (countrec) (not (match-countrec-lambda countrec))))
         
         (unrelated-rec-list   
          (filter nonmatch-countrec-lambda other-items-count-recs)))
    
    (cons updated-count-rec unrelated-rec-list)))

;; full count not including toplevel
;;
(define (db:get-all-state-status-counts-for-testname dbstruct run-id test-name)
  (let* ((test-count-recs (db:with-db
			   dbstruct #f #f
			   (lambda (db)
			     (sqlite3:map-row
			      (lambda (state status count)
				(make-dbr:counts state: state status: status count: count))
			      db
			      "SELECT state,status,count(id) FROM tests WHERE run_id=? AND testname=? AND item_path != '' GROUP BY state,status;"
                                     run-id test-name)))))
    test-count-recs))


;; full count not including toplevel
;;
(define (db:get-all-state-status-counts-for-testname dbstruct run-id test-name)
  (let* ((test-count-recs (db:with-db
			   dbstruct #f #f
			   (lambda (db)
			     (sqlite3:map-row
			      (lambda (state status count)
				(make-dbr:counts state: state status: status count: count))
			      db
			      "SELECT state,status,count(id) FROM tests WHERE run_id=? AND testname=? AND item_path != '' GROUP BY state,status;"
                                     run-id test-name)))))
    test-count-recs))


;; (define (db:get-all-item-states db run-id test-name)
;;   (sqlite3:map-row 
;;    (lambda (a) a)
;;    db
;;    "SELECT DISTINCT state FROM tests WHERE item_path != '' AND state != 'DELETED' AND run_id=? AND testname=?"
;;    run-id test-name))

Modified launch.scm from [d0067277fa] to [4f3065814f].

729
730
731
732
733
734
735
736

737
738
739
740
741
742
743
729
730
731
732
733
734
735

736
737
738
739
740
741
742
743







-
+







;; 100% COMPLETED/ (PASS,FAIL,ABORT etc.) ==> COMPLETED / X where X is same as itemized rollup
;; > 3 RUNNING with not test_dead do nothing (run should already be RUNNING/ na
;; > 0 RUNNING and test_dead then send KILLREQ ==> COMPLETED
;; 0 RUNNING ==> this is actually the first condition, should not get here

(define (launch:end-of-run-check run-id )
    (let* ((not-completed-cnt (rmt:get-not-completed-cnt run-id))  
           (running-cnt (rmt:get-count-tests-running-for-run-id run-id #f)) ;; fastmode=no
           (running-cnt (rmt:get-count-tests-running-for-run-id run-id)) ;; fastmode=no
           (all-test-launched (rmt:get-var (conc "lunch-complete-" run-id)))
           (current-state (rmt:get-run-state run-id))
           (current-status (rmt:get-run-status run-id)))
     ;;get-vars run-id to query metadata table to check if all completed. if all-test-launched = yes then only not-completed-cnt = 0 means everyting is completed if no entry found in the table do nothing 
     (debug:print 0 *default-log-port* "Running test cnt :" running-cnt)                      
     (rmt:set-state-status-and-roll-up-run  run-id current-state current-status)
     (runs:update-junit-test-reporter-xml run-id) 

Modified rmt.scm from [39d97c528a] to [169f275811].

659
660
661
662
663
664
665
666
667


668
669
670
671
672
673
674
675
676


677
678
679
680
681
682
683
659
660
661
662
663
664
665


666
667
668
669
670
671
672
673
674


675
676
677
678
679
680
681
682
683







-
-
+
+







-
-
+
+







	   (map (lambda (run-id)
		  (rmt:send-receive 'test-get-paths-matching-keynames-target-new run-id (list run-id keynames target res testpatt statepatt statuspatt runname)))
	   run-ids))))

(define (rmt:get-prereqs-not-met run-id waitons ref-test-name ref-item-path #!key (mode '(normal))(itemmaps #f))
  (rmt:send-receive 'get-prereqs-not-met run-id (list run-id waitons ref-test-name ref-item-path mode itemmaps)))

(define (rmt:get-count-tests-running-for-run-id run-id fastmode)
  (rmt:send-receive 'get-count-tests-running-for-run-id run-id (list run-id fastmode)))
(define (rmt:get-count-tests-running-for-run-id run-id)
  (rmt:send-receive 'get-count-tests-running-for-run-id run-id (list run-id)))

(define (rmt:get-not-completed-cnt run-id)
  (rmt:send-receive 'get-not-completed-cnt run-id (list run-id)))


;; Statistical queries

(define (rmt:get-count-tests-running run-id fastmode)
  (rmt:send-receive 'get-count-tests-running run-id (list run-id fastmode)))
(define (rmt:get-count-tests-running run-id)
  (rmt:send-receive 'get-count-tests-running run-id (list run-id)))

(define (rmt:get-count-tests-running-for-testname run-id testname)
  (rmt:send-receive 'get-count-tests-running-for-testname run-id (list run-id testname)))

(define (rmt:get-count-tests-running-in-jobgroup run-id jobgroup)
  (rmt:send-receive 'get-count-tests-running-in-jobgroup run-id (list run-id jobgroup)))

Modified runs.scm from [030b929939] to [01904d0a79].

319
320
321
322
323
324
325
326

327
328
329
330
331
332
333
319
320
321
322
323
324
325

326
327
328
329
330
331
332
333







-
+







  ;; Take advantage of a good place to exit if running the one-pass methodology
  (if (and (> (runs:dat-can-run-more-tests-count runsdat) 20)
	   (args:get-arg "-one-pass"))
      (exit 0))

  (if (runs:dat-load-mgmt-function runsdat)((runs:dat-load-mgmt-function runsdat)))

  (let* ((num-running             (rmt:get-count-tests-running run-id #f)) ;; fastmode=no
  (let* ((num-running             (rmt:get-count-tests-running run-id))
	 (num-running-in-jobgroup (rmt:get-count-tests-running-in-jobgroup run-id jobgroup))
	 (job-group-limit         (let ((jobg-count (configf:lookup *configdat* "jobgroups" jobgroup)))
				    (if (string? jobg-count)
					(string->number jobg-count)
					jobg-count))))
    (if (> (+ num-running num-running-in-jobgroup) 0)
	(runs:inc-can-run-more-tests-count runsdat)) ;; (set! *runs:can-run-more-tests-count* (+ *runs:can-run-more-tests-count* 1)))
1563
1564
1565
1566
1567
1568
1569
1570

1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585

1586
1587
1588
1589
1590
1591
1592
1563
1564
1565
1566
1567
1568
1569

1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593







-
+















+







				       ))
				   extras)
				  extras)
				'())))
	     (waitons     (delete-duplicates (append (tests:testqueue-get-waitons test-record) extra-waits) equal?))
	     (newtal      (append tal (list hed)))
	     (regfull     (>= (length reg) reglen))
	     (num-running (rmt:get-count-tests-running-for-run-id run-id #t)) ;; fastmode=yes
	     (num-running (rmt:get-count-tests-running-for-run-id run-id))
	     (testdat     (make-runs:testdat
			   hed: hed
			   tal: tal
			   reg: reg
			   reruns: reruns
			   test-record: test-record
			   test-name:   test-name
			   item-path:   item-path
			   jobgroup:    jobgroup
			   waitons:     waitons
			   testmode:    testmode
			   newtal:      newtal
			   itemmaps:    itemmaps
			   ;; prereqs-not-met: prereqs-not-met
			   )))
	(thread-sleep! 0.2) ;; no point in blasting though this a thousand times a second
	(runs:dat-regfull-set! runsdat regfull)
    
	(if (> num-running 0)
            (set! last-time-some-running (current-seconds)))

        (if (> (current-seconds)(+ last-time-some-running (or (configf:lookup *configdat* "setup" "give-up-waiting") 36000)))
            (hash-table-set! *max-tries-hash* tfullname (+ (hash-table-ref/default *max-tries-hash* tfullname 0) 1)))
1829
1830
1831
1832
1833
1834
1835
1836

1837
1838
1839
1840
1841
1842
1843
1844
1845
1846

1847
1848
1849
1850
1851
1852
1853
1854
1855
1856

1857
1858
1859
1860
1861
1862
1863
1830
1831
1832
1833
1834
1835
1836

1837
1838
1839
1840
1841
1842
1843
1844
1845
1846

1847
1848
1849
1850
1851
1852
1853
1854
1855
1856

1857
1858
1859
1860
1861
1862
1863
1864







-
+









-
+









-
+







    (rmt:set-var (conc "lunch-complete-" run-id) "yes")  
        
    ;; now *if* -run-wait we wait for all tests to be done
    ;; Now wait for any RUNNING tests to complete (if in run-wait mode)
    ;; (if (runs:dat-load-mgmt-function runsdat)((runs:dat-load-mgmt-function runsdat)))
    (thread-sleep! 10) ;; I think there is a race condition here. Let states/statuses settle
    
    (let wait-loop ((num-running      (rmt:get-count-tests-running-for-run-id run-id #t)) ;; fastmode=yes
    (let wait-loop ((num-running      (rmt:get-count-tests-running-for-run-id run-id))
		    (prev-num-running 0))
      ;; (debug:print-info 13 *default-log-port* "num-running=" num-running ", prev-num-running=" prev-num-running)
      (if (and (or (args:get-arg "-run-wait")
		   (equal? (configf:lookup *configdat* "setup" "run-wait") "yes"))
	       (> num-running 0))
	  (begin
	    ;; Here we mark any old defunct tests as incomplete. Do this every fifteen minutes
	    ;; (debug:print 0 *default-log-port* "Got here eh! num-running=" num-running " (> num-running 0) " (> num-running 0))
	    (if (> (current-seconds)(+ last-time-incomplete 900))
		(let ((actual-num-running (rmt:get-count-tests-running-for-run-id run-id #f))) ;; fastmode=no
		(let ((actual-num-running (rmt:get-count-tests-running-for-run-id run-id)))
		  (debug:print-info 0 *default-log-port* "Marking stuck tests as INCOMPLETE while waiting for run " run-id
				    ". Running as pid " (current-process-id) " on " (get-host-name))
		  (set! last-time-incomplete (current-seconds)) ;; FIXME, this might be causing slow down - use of set!
		  (rmt:find-and-mark-incomplete run-id #f)
		  (debug:print-info 0 *default-log-port* "run-wait specified, waiting on " actual-num-running
				    " tests in RUNNING, REMOTEHOSTSTART or LAUNCHED state at "
				    (time->string (seconds->local-time (current-seconds))))))
	    ;; (if (runs:dat-load-mgmt-function runsdat)((runs:dat-load-mgmt-function runsdat)))
	    (thread-sleep! 5) ;; (if (>= num-running max-concurrent-jobs) 5 1))
	    (wait-loop (rmt:get-count-tests-running-for-run-id run-id #t) ;; fastmode=yes
	    (wait-loop (rmt:get-count-tests-running-for-run-id run-id)
		       num-running))))
    ;; LET* ((test-record
    ;; we get here on "drop through". All done!
    ;; this is moved to runs:run-testes since this function is getting called twice to ensure everthing is completed. 
    ;; (debug:print-info 0 *default-log-port* "Calling Post Hook")    
    ;; (runs:run-post-hook run-id)
    (debug:print-info 1 *default-log-port* "All tests launched")))