Changes In Branch 1.65-archive Excluding Merge-Ins
This is equivalent to a diff from 2769e4b7c9 to 50829a5607
2021-03-01
| ||
17:42 | Manually patched in the new view check-in: f5206150ee user: mrwellan tags: v1.6569-new-view | |
2021-01-26
| ||
14:00 | Fix for the > crash. Maybe... Leaf check-in: 5a05fc04ff user: matt tags: v1.6569-gt-crash-fix | |
2021-01-25
| ||
12:03 | rebased lazy-queue rollup check-in: 07ab120544 user: matt tags: v1.65-lazyqueue-items-rollup | |
2021-01-15
| ||
22:46 | begin diet check-in: badd71f3b3 user: matt tags: v1.6569-diet | |
21:34 | eval-string-in-environment if was disabled, re-enabled check-in: 9564772564 user: matt tags: v1.6569-reenable-eval-if | |
2021-01-08
| ||
11:42 | enable custom value for max delay between archive time and test last update time Leaf check-in: 86a3d1148e user: pjhatwal tags: v1.6569-refactor | |
2020-11-25
| ||
12:00 | Fixed issues in server gating code Leaf check-in: 063273e8cb user: mrwellan tags: v1.6569-server-gate-fix | |
2020-11-24
| ||
22:27 | Added support for resetting run - allows to reload tests-paths to add tests to a run part way though. Just run megatest -clean-cache -runname $MT_RUNNAME Leaf check-in: 213021e02d user: mrwellan tags: v1.6596-reload-tests-paths | |
2020-10-13
| ||
16:46 | Changed version from 69 to 76. No other changes. Will compile with chicken 13 check-in: 87ca35010f user: mmgraham tags: v1.65, v1.6576 | |
2020-10-12
| ||
16:49 | Reduced message from failed to info. Reverted a delay which seems to help pass full stack ext-tests. Leaf check-in: 9e35b1252c user: mrwellan tags: v1.65-minor-patch | |
10:18 | Safe vector access in rmt. check-in: 58bb6d997a user: mrwellan tags: v1.65-side2 | |
2020-10-11
| ||
22:46 | Patched forward adjutant code. check-in: f936717bfa user: matt tags: v1.65-adjutant-again | |
2020-10-05
| ||
22:49 | Do not exit on failure to create directory - race conditons on NFS cause false fail scenarios - just keep going and cross your fingers... (cherrypicked from v1.6572) check-in: 05b253a452 user: matt tags: v1.65-sidework | |
22:46 | run duration testdat check-in: 4a0b43f3c6 user: matt tags: v1.65-test-rundat2 | |
2020-09-22
| ||
15:41 | updates to not start wachdog for -archive replicate-db Closed-Leaf check-in: 50829a5607 user: pjhatwal tags: 1.65-archive | |
2020-09-21
| ||
15:40 | catch up with v1.65 check-in: d625275890 user: pjhatwal tags: 1.65-archive | |
15:36 | merged in 1.65-test-rundat branch ==/FAIL/orion,mars/== check-in: cfd25d66e9 user: mmgraham tags: v1.6571, v1.65-failed-testdat | |
07:00 | Added get-testsuite-name all over launch:setup and still not set when needed! This did NOT work. Closed-Leaf check-in: 2efe8ad422 user: mrwellan tags: v1.65-get-testsuitename | |
2020-09-19
| ||
04:21 | Start moving test_rundat to no-sync db. ==/20/2/WARN/1203/mars/== check-in: abfabdb839 user: matt tags: v1.65-test-rundat | |
2020-09-18
| ||
17:30 | added check for file existence before file delete ==/14/1.9/WARN/orion,mars/== NOTE: This is the last v1.65 before the split off. I.e code from before this point IS in the far future v1.65 branch. Code from this point to that branch might NOT be in the branch. check-in: 2769e4b7c9 user: mmgraham tags: v1.65, v1.6569 | |
12:27 | cherry picked 2 fixes, changed version to 1.6569 ==/7.2/2.0/PASS/1201/mars/== check-in: d145d0eb02 user: mmgraham tags: v1.65 | |
Modified TODO from [da5eae4898] to [0885dee1e5].
︙ | ︙ | |||
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | # # You should have received a copy of the GNU General Public License # along with Megatest. If not, see <http://www.gnu.org/licenses/>. TODO ==== WW15 . fill newview matrix with data, filter pipeline gui elements . improve [script], especially indent handling WW16 . split db into megatest.db (runs etc.) db/<something>.db . release basic newview implementation WW18 . release split db implementation . mtutil calls from dashboard (for remote control) . logs browser (esp. for surfacing mtutil related activities) WW19 . break command line into sections; all, run control, queries, utilities etc. . pull in ftfplan (not integrated, just code pulled in) WW20 . ./configure => ubuntu, sles11, sles12, rh7 | > > > > > > > < | 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | # # You should have received a copy of the GNU General Public License # along with Megatest. If not, see <http://www.gnu.org/licenses/>. TODO ==== WW38 . Add test_rundat to no-sync ==> correction, put in <testdir>/.meta/test-run.dat . Add STATE/STATUS transitions to .meta/test-run.dat or similar . Swizzle update-test-rundat to operate on no-sync . Swizzle update-run-duration, -uname-host and cpuload-diskfree to no-sync . On state/status change update tests table with duration WW15 . fill newview matrix with data, filter pipeline gui elements . improve [script], especially indent handling WW16 . split db into megatest.db (runs etc.) db/<something>.db . release basic newview implementation WW18 . release split db implementation . mtutil calls from dashboard (for remote control) . logs browser (esp. for surfacing mtutil related activities) WW19 . break command line into sections; all, run control, queries, utilities etc. . pull in ftfplan (not integrated, just code pulled in) WW20 . ./configure => ubuntu, sles11, sles12, rh7 . Add output flushing in teamcity support . Switch to using simple runs query everywhere . Add end_time to runs and add a rollup call that sets state, status and end_time Future . Switch to scsh-process pipeline management for job execution/control . Use call-with-environment-variables more. |
︙ | ︙ |
Modified archive.scm from [f391351322] to [0b2855ba26].
︙ | ︙ | |||
118 119 120 121 122 123 124 | (let ((res (cons block-id archive-path))) (hash-table-set! blockid-cache key res) res) (begin (debug:print 0 *default-log-port* "WARNING: no disk found for " target ", " run-name ", " test-name ", archive-path=" archive-path) #f))) (begin | | | 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | (let ((res (cons block-id archive-path))) (hash-table-set! blockid-cache key res) res) (begin (debug:print 0 *default-log-port* "WARNING: no disk found for " target ", " run-name ", " test-name ", archive-path=" archive-path) #f))) (begin (debug:print 0 *default-log-port* "WARNING: no disk found for " target ", " run-name ", " test-name ) #f)))))) ;; no best disk found ;; archive - run bup ;; ;; 1. create the bup dir if not exists ;; 2. start the du of each directory ;; 3. gen index |
︙ | ︙ | |||
222 223 224 225 226 227 228 | (toplevel/children (debug:print 0 *default-log-port* "WARNING: cannot archive " test-name " with id " test-id " as it is a toplevel test with children")) ((not (common:file-exists? test-path)) (debug:print 0 *default-log-port* "WARNING: Cannot archive " test-name "/" item-path " as path " test-path " does not exist")) (else | | | 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | (toplevel/children (debug:print 0 *default-log-port* "WARNING: cannot archive " test-name " with id " test-id " as it is a toplevel test with children")) ((not (common:file-exists? test-path)) (debug:print 0 *default-log-port* "WARNING: Cannot archive " test-name "/" item-path " as path " test-path " does not exist")) (else (debug:print 2 *default-log-port* "From test-dat=" test-dat " derived the following:\n" "test-partial-path = " test-partial-path "\n" "test-path = " test-path "\n" "test-physical-path = " test-physical-path "\n" "partial-path-index = " partial-path-index "\n" "test-base = " test-base) (hash-table-set! disk-groups test-base |
︙ | ︙ | |||
266 267 268 269 270 271 272 | (create-directory archive-dir #t)) (case archiver ((bup) ;; Archive using bup (let* ((bup-init-params (list "-d" archive-dir "init")) (bup-index-params (append (list "-d" archive-dir "index") test-paths)) (bup-save-params (append (list "-d" archive-dir "save" ;; (conc "--strip-path=" linktree) (conc "-" compress) ;; or (conc "--compress=" compress) | | | | | > > > > | | > > > > | | > > > > | 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 | (create-directory archive-dir #t)) (case archiver ((bup) ;; Archive using bup (let* ((bup-init-params (list "-d" archive-dir "init")) (bup-index-params (append (list "-d" archive-dir "index") test-paths)) (bup-save-params (append (list "-d" archive-dir "save" ;; (conc "--strip-path=" linktree) (conc "-" compress) ;; or (conc "--compress=" compress) "-n" (conc (common:get-testsuite-name) "-"(string-substitute "/" "-" target " ")) (conc "--strip-path=" (conc test-base target "/" )) ;; if we push to the directory do we need this? ) test-paths))) (if (not (common:file-exists? (conc archive-dir "/HEAD"))) (begin ;; replace this with jobrunner stuff enventually (debug:print-info 2 *default-log-port* "Init bup in " archive-dir) ;; (mutex-lock! bup-mutex) (let-values (((pid-val exit-status exit-code) (run-n-wait bup-exe params: bup-init-params print-cmd: print-prefix))) (if (not (eq? exit-code 0)) (begin (debug:print-error 0 *default-log-port* "There was an error initializing bup. Archive failed.") (exit 1)))) ;; (mutex-unlock! bup-mutex) )) (debug:print-info 2 *default-log-port* "Indexing data to be archived") ;; (mutex-lock! bup-mutex) (let-values (((pid-val exit-status exit-code) (run-n-wait bup-exe params: bup-index-params print-cmd: print-prefix))) (if (not (eq? exit-code 0)) (begin (debug:print-error 0 *default-log-port* "There was an error Indexing bup. Archive failed.") (exit 1)))) (debug:print-info 2 *default-log-port* "Archiving data with bup") (let-values (((pid-val exit-status exit-code) (run-n-wait bup-exe params: bup-save-params print-cmd: print-prefix))) (if (not (eq? exit-code 0)) (begin (debug:print-error 0 *default-log-port* "There was an archiving data with bup. Archive failed.") (exit 1)))))) ((7z tar) (for-each (lambda (test-dat) (let* ((test-id (db:test-get-id test-dat)) (test-name (db:test-get-testname test-dat)) (item-path (db:test-get-item-path test-dat)) (test-full-name (db:test-make-full-name test-name item-path)) |
︙ | ︙ | |||
334 335 336 337 338 339 340 | (rsync-exe (or (configf:lookup *configdat* "archive" "rsync") "rsync")) (print-prefix "Running: ") (archive-info (archive:allocate-new-archive-block blockid-cache *toppath* tsname min-space target-patt run-patt "megatest-db")) (archive-dir (if archive-info (cdr archive-info) #f)) (archive-id (if archive-info (car archive-info) -1)) (home-host (common:get-homehost)) (archive-time (seconds->std-time-str (current-seconds))) | | | | > > > > | | > > > > | | > > > > > | | > | < | 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 | (rsync-exe (or (configf:lookup *configdat* "archive" "rsync") "rsync")) (print-prefix "Running: ") (archive-info (archive:allocate-new-archive-block blockid-cache *toppath* tsname min-space target-patt run-patt "megatest-db")) (archive-dir (if archive-info (cdr archive-info) #f)) (archive-id (if archive-info (car archive-info) -1)) (home-host (common:get-homehost)) (archive-time (seconds->std-time-str (current-seconds))) (archive-staging-db (conc *toppath* "/.db-snapshot/archive_" archive-time)) (tmp-db-path (conc (common:get-db-tmp-area) "/megatest.db")) (dbfile (conc archive-staging-db "/megatest.db"))) (create-directory archive-staging-db #t) (let-values (((pid-val exit-status exit-code) (run-n-wait rsync-exe params: (list "-v" (conc (car home-host) ":"tmp-db-path) archive-staging-db) print-cmd: print-prefix))) (if (eq? exit-code 0) (case archiver ((bup) ;; Archive using bup (let* ((bup-init-params (list "-d" archive-dir "init")) (bup-index-params (list "-d" archive-dir "index" archive-staging-db)) (bup-save-params (list "-d" archive-dir "save" ;; (conc "--strip-path=" linktree) (conc "-" compress) ;; or (conc "--compress=" compress) "-n" (conc tsname "-megatest-db" ) (conc "--strip-path=" archive-staging-db ) ;; if we push to the directory do we need this? dbfile))) (if (not (common:file-exists? (conc archive-dir "/HEAD"))) (begin ;; replace this with jobrunner stuff enventually (debug:print-info 2 *default-log-port* "Init bup in " archive-dir) (let-values (((pid-val exit-status exit-code)(run-n-wait bup-exe params: bup-init-params print-cmd: print-prefix))) (if (not (eq? exit-code 0)) (begin (debug:print-error 0 *default-log-port* "There was an error initializing bup. Archive failed.") (exit 1)))))) (debug:print-info 2 *default-log-port* "Indexing data to be archived") (let-values (((pid-val exit-status exit-code) (run-n-wait bup-exe params: bup-index-params print-cmd: print-prefix))) (if (not (eq? exit-code 0)) (begin (debug:print-error 0 *default-log-port* "There was an error Indexing bup. Archive failed.") (exit 1)))) (debug:print-info 2 *default-log-port* "Archiving data with bup") (let-values (((pid-val exit-status exit-code) (run-n-wait bup-exe params: bup-save-params print-cmd: print-prefix))) (if (not (eq? exit-code 0)) (begin (debug:print-error 0 *default-log-port* "There was an error archiving data with bup. Archive failed.") (exit 1)) (debug:print-info 0 *default-log-port* "To restore megatest.db run megatest -archive replicacte-db -source archive-dir -time-stamp <ts>. Current timestamp: " (seconds->std-time-str (current-seconds))))))) (else (debug:print-info 0 *default-log-port* "No support for databse archiving with " archiver))) (debug:print-error 0 *default-log-port* "There was an error rsyncing tmp database"))))) (define (archive:restore-db archive-path ts) (let* ((bup-exe (or (configf:lookup *configdat* "archive" "bup") "bup")) (archive-internal-path (conc (common:get-testsuite-name) "-megatest-db/" ts "/megatest.db" )) (bup-restore-params (list "-d" archive-path "restore" "-C" *toppath* archive-internal-path))) (debug:print-info 2 *default-log-port* "Restoring archived data to " *toppath* " from archive in " archive-path " ... " archive-internal-path) (run-n-wait bup-exe params: bup-restore-params print-cmd: "Running:")) (sleep 2) (db:multi-db-sync (db:setup #f) 'killservers ;'dejunk ;'adj-testids 'old2new ) (debug:print-info 1 *default-log-port* "dropping triggers to update linktree") (rmt:drop-all-triggers) (let* ((linktree (common:get-linktree)) ;; (configf:lookup *configdat* "setup" "linktree"))) (src-archive-linktree (rmt:get-var "src-archive-linktree"))) (if (not (equal? src-archive-linktree linktree)) (rmt:update-tesdata-on-repilcate-db src-archive-linktree linktree)) (debug:print-info 1 *default-log-port* "creating triggers after updating linktree") (rmt:create-all-triggers) )) |
︙ | ︙ | |||
409 410 411 412 413 414 415 | (define (seconds->std-time-str sec) (time->string (seconds->local-time sec) "%Y-%m-%d-%H%M%S")) | | | | 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 | (define (seconds->std-time-str sec) (time->string (seconds->local-time sec) "%Y-%m-%d-%H%M%S")) (define (archive:get-timestamp-dir bup-exe archive-dir testsuite-name target test-partial-path test-last-update) (print (seconds->std-time-str test-last-update)) (let* ((internal-path (conc testsuite-name "-" target)) (ts-list (archive:ls->list bup-exe archive-dir internal-path)) (ds-flag (vector-ref (seconds->local-time) 8))) (let loop ((hed (car ts-list)) (tail (cdr ts-list))) (if (and (null? tail) (equal? hed "latest")) #f (if (and (not (null? tail)) (equal? hed "latest")) |
︙ | ︙ | |||
453 454 455 456 457 458 459 | (test-id (db:test-get-id test-dat)) (run-id (db:test-get-run_id test-dat)) (keyvals (rmt:get-key-val-pairs run-id)) (target (string-intersperse (map cadr keyvals) "/")) (toplevel/children (and (db:test-get-is-toplevel test-dat) (> (rmt:test-toplevel-num-items run-id test-name) 0))) | | | | | 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 | (test-id (db:test-get-id test-dat)) (run-id (db:test-get-run_id test-dat)) (keyvals (rmt:get-key-val-pairs run-id)) (target (string-intersperse (map cadr keyvals) "/")) (toplevel/children (and (db:test-get-is-toplevel test-dat) (> (rmt:test-toplevel-num-items run-id test-name) 0))) (test-partial-path (conc run-name "/" (db:test-make-full-name test-name item-path))) ;; note the trailing slash to get the dir inspite of it being a link (test-path (conc linktree "/" test-partial-path)) ;; if the old path was not deleted then prev-test-physical-path will end up pointing to a real directory (mutex-lock! rp-mutex) (prev-test-physical-path (if (common:file-exists? test-path) ;; (read-symbolic-link test-path #t) (common:real-path test-path) #f)) (mutex-unlock! rp-mutex) (new-test-physical-path (conc best-disk "/" test-partial-path)) (archive-block-id (db:test-get-archived test-dat)) (test-last-update (db:test-get-last_update test-dat)) (archive-block-info (rmt:test-get-archive-block-info archive-block-id)) (archive-path (if (vector? archive-block-info) (vector-ref archive-block-info 2) ;; look in db.scm for test-get-archive-block-info for the vector record info #f)) ;; no archive found? (archive-timestamp-dir (if archive-path (archive:get-timestamp-dir bup-exe archive-path (common:get-testsuite-name) (string-substitute "/" "-" target " ") test-partial-path test-last-update) #f)) (archive-internal-path (conc (common:get-testsuite-name) "-" (string-substitute "/" "-" target " ") "/" archive-timestamp-dir "/" test-partial-path)) (include-paths (args:get-arg "-include")) (exclude-pattern (args:get-arg "-exclude-rx")) (exclude-file (args:get-arg "-exclude-rx-from"))) (if (not archive-timestamp-dir) (debug:print-error 0 *default-log-port* "Archive not found for testsuite" (common:get-testsuite-name) " run/test/itempath" test-partial-path) (begin ;; some sanity checks, move an existing path out of the way - iif it is not a toplevel with children |
︙ | ︙ |
Modified db.scm from [fb3a18f52f] to [a4ece5d588].
︙ | ︙ | |||
1532 1533 1534 1535 1536 1537 1538 | id INTEGER PRIMARY KEY, test_id INTEGER, state TEXT DEFAULT 'new', status TEXT DEFAULT 'n/a', archive_type TEXT DEFAULT 'bup', du INTEGER, archive_path TEXT);"))) | < | 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 | id INTEGER PRIMARY KEY, test_id INTEGER, state TEXT DEFAULT 'new', status TEXT DEFAULT 'n/a', archive_type TEXT DEFAULT 'bup', du INTEGER, archive_path TEXT);"))) (db:create-triggers db) db)) ;; ) ;;====================================================================== ;; A R C H I V E S ;;====================================================================== |
︙ | ︙ | |||
3465 3466 3467 3468 3469 3470 3471 | (let* ((run-ids (db:get-all-run-ids mtdb))) (for-each (lambda (run-id) (let ((testrecs (db:get-all-tests-info-by-run-id mtdb run-id))) (db:prep-megatest.db-adj-test-ids (db:dbdat-get-db mtdb) run-id testrecs))) run-ids))) | | | 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 | (let* ((run-ids (db:get-all-run-ids mtdb))) (for-each (lambda (run-id) (let ((testrecs (db:get-all-tests-info-by-run-id mtdb run-id))) (db:prep-megatest.db-adj-test-ids (db:dbdat-get-db mtdb) run-id testrecs))) run-ids))) ;; Get test data using test_id, run-id is not used - but it will be! ;; (define (db:get-test-info-by-id dbstruct run-id test-id) (db:with-db dbstruct #f ;; run-id #f (lambda (db) |
︙ | ︙ |
Modified launch.scm from [d0067277fa] to [e9ff9ffdfc].
︙ | ︙ | |||
203 204 205 206 207 208 209 | (round (- (current-seconds) start-seconds))))) (kill-tries 0)) ;; (tests:set-full-meta-info #f test-id run-id (calc-minutes) work-area) ;; (tests:set-full-meta-info test-id run-id (calc-minutes) work-area) | | | > > | 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 | (round (- (current-seconds) start-seconds))))) (kill-tries 0)) ;; (tests:set-full-meta-info #f test-id run-id (calc-minutes) work-area) ;; (tests:set-full-meta-info test-id run-id (calc-minutes) work-area) (tests:set-full-meta-info #f test-id run-id (calc-minutes) work-area 10 update-db: #t) (let loop ((minutes (calc-minutes)) (cpu-load (alist-ref 'adj-core-load (common:get-normalized-cpu-load #f))) (disk-free (get-df (current-directory))) (last-sync (current-seconds))) ;; (common:telemetry-log "zombie" (conc "launch:monitor-job - ;; top of loop encountered at "(current-seconds)" with ;; last-sync="last-sync)) (let* ((over-time (> (current-seconds) (+ last-sync update-period))) (new-cpu-load (let* ((load (alist-ref 'adj-core-load (common:get-normalized-cpu-load #f))) (delta (abs (- load cpu-load)))) (if (> delta 0.1) ;; don't bother updating with small changes load #f))) (new-disk-free (let* ((df (if over-time ;; only get df every 30 seconds |
︙ | ︙ | |||
231 232 233 234 235 236 237 | (do-sync (or new-cpu-load new-disk-free over-time)) (test-info (rmt:get-test-info-by-id run-id test-id)) (state (db:test-get-state test-info)) (status (db:test-get-status test-info)) (kill-reason "no kill reason specified") (kill-job? #f)) | | | | < < < | < < | 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 | (do-sync (or new-cpu-load new-disk-free over-time)) (test-info (rmt:get-test-info-by-id run-id test-id)) (state (db:test-get-state test-info)) (status (db:test-get-status test-info)) (kill-reason "no kill reason specified") (kill-job? #f)) #;(common:telemetry-log "zombie" (conc "launch:monitor-job - decision time encountered at "(current-seconds)" with last-sync="last-sync" do-sync="do-sync" over-time="over-time" update-period="update-period)) (cond ((test-get-kill-request run-id test-id) (set! kill-reason "KILLING TEST since received kill request (KILLREQ)") (set! kill-job? #t)) ((and runtlim (> (- (current-seconds) start-seconds) runtlim)) (set! kill-reason (conc "KILLING TEST DUE TO TIME LIMIT EXCEEDED! Runtime=" (- (current-seconds) start-seconds) " seconds, limit=" runtlim)) (set! kill-job? #t)) ((equal? status "DEAD") (tests:update-central-meta-info run-id test-id new-cpu-load new-disk-free (calc-minutes) #f #f update-db: #t) (rmt:set-state-status-and-roll-up-items run-id test-id 'foo "RUNNING" "n/a" "was marked dead; really still running.") ;;(set! kill-reason "KILLING TEST because it was marked as DEAD by launch:handle-zombie-tests (might indicate really overloaded server or else overzealous setup.deadtime)") ;; MARK RUNNING (set! kill-job? #f))) (debug:print 4 *default-log-port* "cpu: " new-cpu-load " disk: " new-disk-free " last-sync: " last-sync " do-sync: " do-sync) (launch:handle-zombie-tests run-id) (if do-sync ;; save meta data about the running of this test (tests:update-central-meta-info run-id test-id new-cpu-load new-disk-free (calc-minutes) #f #f)) (if kill-job? (begin (debug:print-info 0 *default-log-port* "proceeding to kill test: "kill-reason) (mutex-lock! m) ;; NOTE: The pid can change as different steps are run. Do we need handshaking between this ;; section and the runit section? Or add a loop that tries three times with a 1/4 second ;; between tries? |
︙ | ︙ | |||
310 311 312 313 314 315 316 | (begin (thread-sleep! 3) ;; (+ 3 (random 6))) ;; add some jitter to the call home time to spread out the db accesses (if (hash-table-ref/default misc-flags 'keep-going #f) ;; keep originals for cpu-load and disk-free unless they change more than the allowed delta (loop (calc-minutes) (or new-cpu-load cpu-load) (or new-disk-free disk-free) (if do-sync (current-seconds) last-sync))))))) | | | 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 | (begin (thread-sleep! 3) ;; (+ 3 (random 6))) ;; add some jitter to the call home time to spread out the db accesses (if (hash-table-ref/default misc-flags 'keep-going #f) ;; keep originals for cpu-load and disk-free unless they change more than the allowed delta (loop (calc-minutes) (or new-cpu-load cpu-load) (or new-disk-free disk-free) (if do-sync (current-seconds) last-sync))))))) (tests:update-central-meta-info run-id test-id (get-cpu-load) (get-df (current-directory))(calc-minutes) #f #f update-db: #t))) ;; NOTE: Checking twice for keep-going is intentional (define (launch:execute encoded-cmd) (let* ((cmdinfo (common:read-encoded-string encoded-cmd)) (tconfigreg #f)) (setenv "MT_CMDINFO" encoded-cmd) ;;(bb-check-path msg: "launch:execute incoming") |
︙ | ︙ | |||
463 464 465 466 467 468 469 | (db:test-get-host test-info) (begin (debug:print 0 *default-log-port* "ERROR: failed to find a record for test-id " test-id ", exiting.") (exit)))) (test-pid (db:test-get-process_id test-info))) (cond ;; -mrw- I'm removing KILLREQ from this list so that a test in KILLREQ state is treated as a "do not run" flag. | | > > | 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 | (db:test-get-host test-info) (begin (debug:print 0 *default-log-port* "ERROR: failed to find a record for test-id " test-id ", exiting.") (exit)))) (test-pid (db:test-get-process_id test-info))) (cond ;; -mrw- I'm removing KILLREQ from this list so that a test in KILLREQ state is treated as a "do not run" flag. ((or (member (db:test-get-state test-info) '("INCOMPLETE" "KILLED" "UNKNOWN" "STUCK")) ;; prior run of this test didn't complete, go ahead and try to rerun (and (equal? (db:test-get-state test-info) "COMPLETED") ;; completed/abort => rerun if asked (member (db:test-get-status test-info) '("ABORT")))) (debug:print 0 *default-log-port* "INFO: test is INCOMPLETE or KILLED, treat this execute call as a rerun request") ;; (tests:test-force-state-status! run-id test-id "REMOTEHOSTSTART" "n/a") (rmt:general-call 'set-test-start-time #f test-id) (rmt:test-set-state-status run-id test-id "REMOTEHOSTSTART" "n/a" #f) ) ;; prime it for running ((member (db:test-get-state test-info) '("RUNNING" "REMOTEHOSTSTART")) |
︙ | ︙ |
Modified megatest.scm from [0e58f17e0f] to [fd8579cb9b].
︙ | ︙ | |||
517 518 519 520 521 522 523 | "-list-disks" "-list-targets" "-show-runconfig" ;;"-list-db-targets" "-show-runconfig" "-show-config" "-show-cmdinfo" | | > > > > > > > > > > | | | 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 | "-list-disks" "-list-targets" "-show-runconfig" ;;"-list-db-targets" "-show-runconfig" "-show-config" "-show-cmdinfo" "-cleanup-db" )) (no-watchdog-argvals (list '("-archive" . "replicate-db"))) (start-watchdog-specail-arg-val (let loop ((hed (car no-watchdog-argvals)) (tail (cdr no-watchdog-argvals))) (print "hed" hed " arg " (args:get-arg (car hed)) " val:" (cdr hed) " eql" (equal? (args:get-arg (car hed)) (cdr hed))) (if (equal? (args:get-arg (car hed)) (cdr hed)) #f (if (null? tail) #t (loop (car tail) (cdr tail)))))) (no-watchdog-args-vals (filter (lambda (x) x) (map args:get-arg no-watchdog-args))) (start-watchdog (and (null? no-watchdog-args-vals) start-watchdog-specail-arg-val))) ;(print "no-watchdog-args="no-watchdog-args "no-watchdog-args-vals="no-watchdog-args-vals " start-watchdog-specail-arg-val:" start-watchdog-specail-arg-val " start-watchdog:" start-watchdog) (if start-watchdog (thread-start! *watchdog*))) ;; bracket open-output-file with code to make leading directory if it does not exist and handle exceptions (define (open-logfile logpath-in) (condition-case |
︙ | ︙ | |||
1967 1968 1969 1970 1971 1972 1973 | (print path)) paths)))))) ;;====================================================================== ;; Archive tests ;;====================================================================== ;; Archive tests matching target, runname, and testpatt | | | | | 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 | (print path)) paths)))))) ;;====================================================================== ;; Archive tests ;;====================================================================== ;; Archive tests matching target, runname, and testpatt (if (equal? (args:get-arg "-archive") "replicate-db") (begin ;; check if source ;; check if megatest.db exist (launch:setup) (if (not (args:get-arg "-source")) (begin (debug:print-info 1 *default-log-port* "Missing required argument -source <archive path>") (exit 1))) (if (common:file-exists? (conc *toppath* "/megatest.db")) (begin (debug:print-info 1 *default-log-port* "File " (conc *toppath* "/megatest.db") " already exists. Please remove it before trying to replicate db") |
︙ | ︙ | |||
1998 1999 2000 2001 2002 2003 2004 | (begin (archive:restore-db src ts) (set! *didsomething* #t)) (begin (debug:print-error 1 *default-log-port* "Path " source " not found") (exit 1)))))) ;; else do a general-run-call | | | 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 | (begin (archive:restore-db src ts) (set! *didsomething* #t)) (begin (debug:print-error 1 *default-log-port* "Path " source " not found") (exit 1)))))) ;; else do a general-run-call (if (and (args:get-arg "-archive") (not (equal? (args:get-arg "-archive") "replicate-db"))) (begin ;; for the archive get we need to preserve the starting dir as part of the target path (if (and (args:get-arg "-dest") (not (equal? (substring (args:get-arg "-dest") 0 1) "/"))) (let ((newpath (conc (current-directory) "/" (args:get-arg "-dest")))) (debug:print-info 1 *default-log-port* "Preserving original path to destination, was " (args:get-arg "-dest") ", now " newpath) (hash-table-set! args:arg-hash "-dest" newpath))) |
︙ | ︙ |
Modified rmt.scm from [39d97c528a] to [29d7593e43].
︙ | ︙ | |||
20 21 22 23 24 25 26 27 28 29 30 31 32 33 | (use format typed-records) ;; RADT => purpose of json format?? (declare (unit rmt)) (declare (uses api)) (declare (uses http-transport)) (include "common_records.scm") ;; (declare (uses rmtmod)) ;; (import rmtmod) ;; ;; THESE ARE ALL CALLED ON THE CLIENT SIDE!!! ;; | > > | 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | (use format typed-records) ;; RADT => purpose of json format?? (declare (unit rmt)) (declare (uses api)) (declare (uses http-transport)) (include "common_records.scm") (include "db_records.scm") ;; (declare (uses rmtmod)) ;; (import rmtmod) ;; ;; THESE ARE ALL CALLED ON THE CLIENT SIDE!!! ;; |
︙ | ︙ | |||
523 524 525 526 527 528 529 | ;; Just some syntatic sugar (define (rmt:register-test run-id test-name item-path) (rmt:general-call 'register-test run-id run-id test-name item-path)) (define (rmt:get-test-id run-id testname item-path) (rmt:send-receive 'get-test-id run-id (list run-id testname item-path))) | | | > > > > > > > > > | 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 | ;; Just some syntatic sugar (define (rmt:register-test run-id test-name item-path) (rmt:general-call 'register-test run-id run-id test-name item-path)) (define (rmt:get-test-id run-id testname item-path) (rmt:send-receive 'get-test-id run-id (list run-id testname item-path))) ;; run-id is NOT used - but it will be! ;; (define (rmt:get-test-info-by-id run-id test-id) (if (number? test-id) (let* ((testdat (rmt:send-receive 'get-test-info-by-id run-id (list run-id test-id))) (trundatf (conc (db:test-get-rundir testdat) "/.mt_data/test-run.dat"))) ;; now we can update a couple fields from the filesystem (if (and (db:test-get-rundir testdat) (file-exists? trundatf)) (let* ((duration (db:test-get-run_duration testdat)) (event-time (db:test-get-event_time testdat)) (last-touch (file-modification-time trundatf))) (db:test-set-run_duration! testdat (max duration (- last-touch event-time))))) testdat) (begin (debug:print 0 *default-log-port* "WARNING: Bad data handed to rmt:get-test-info-by-id run-id=" run-id ", test-id=" test-id) (print-call-chain (current-error-port)) #f))) (define (rmt:test-get-rundir-from-test-id run-id test-id) (rmt:send-receive 'test-get-rundir-from-test-id run-id (list run-id test-id))) |
︙ | ︙ |
Modified runs.scm from [030b929939] to [ca48301598].
︙ | ︙ | |||
2200 2201 2202 2203 2204 2205 2206 | ;; delete redundant runs within a target - N is the input ;; delete redundant runs within a target IFF older than given date/time AND keep at least N ;; (define (runs:remove-all-but-last-n-runs-per-target target-patts runpatt num-to-keep #!key (actions '(print))) (let* ((runs-ht (runs:get-hash-by-target target-patts runpatt)) (age (if (args:get-arg "-age")(common:hms-string->seconds (args:get-arg "-age")) #f)) (age-mark (if age (- (current-seconds) age) (+ (current-seconds) 86400))) | | > > > > > > | > > | | | | | | | | | | | | | | | | | < < | < < | < < | > | 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 | ;; delete redundant runs within a target - N is the input ;; delete redundant runs within a target IFF older than given date/time AND keep at least N ;; (define (runs:remove-all-but-last-n-runs-per-target target-patts runpatt num-to-keep #!key (actions '(print))) (let* ((runs-ht (runs:get-hash-by-target target-patts runpatt)) (age (if (args:get-arg "-age")(common:hms-string->seconds (args:get-arg "-age")) #f)) (age-mark (if age (- (current-seconds) age) (+ (current-seconds) 86400))) (precmd (or (args:get-arg "-precmd") "")) (action-chk (member (string->symbol "remove-runs") actions))) ;; check the sequence of actions archive must comme before remove-runs (if (and action-chk (member (string->symbol "archive") action-chk)) (begin (debug:print-error 0 *default-log-port* "action remove-runs must come after archive") (exit 1))) (print "Actions: " actions " age: " age) (for-each (lambda (action) (for-each (lambda (target) (let* ((runs (hash-table-ref runs-ht target)) (sorted (sort runs (lambda (a b)(< (simple-run-event_time a)(simple-run-event_time b))))) (to-remove (let* ((len (length sorted)) (trim-amt (- len num-to-keep))) (if (> trim-amt 0) (take sorted trim-amt) '())))) (hash-table-set! runs-ht target to-remove) (print target ":") (for-each (lambda (run) (let ((remove (member run to-remove (lambda (a b) (eq? (simple-run-id a) (simple-run-id b)))))) (if (and age (> (simple-run-event_time run) age-mark)) (print "Skipping handling of " target "/" (simple-run-runname run) " as it is younger than " (args:get-arg "-age")) (case action ((print) (print " " (simple-run-runname run) " " (time->string (seconds->local-time (simple-run-event_time run)) "WW%V.%u %H:%M:%S") " " (if remove "REMOVE" ""))) ((remove-runs) (if remove (system (conc precmd " megatest -remove-runs -target " target " -runname " (simple-run-runname run) " -testpatt %" (if (member 'kill-runs actions) ;; if kill-runs is specified then set -kill-wait to 0 " -kill-wait 0" ""))))) ((archive) (if remove (system (conc precmd " megatest -archive save-remove -target " target " -runname " (simple-run-runname run) " -testpatt %")))) ((kill-runs) (if remove (system (conc precmd " megatest -kill-runs -target " target " -runname " (simple-run-runname run) " -testpatt %")))))))) sorted))) (hash-table-keys runs-ht))) actions) runs-ht)) (define (remove-last-path-directory path-in) (let* ((dparts (string-split path-in "/")) (path-out (conc "/" (string-intersperse (take dparts (- (length dparts) 1)) "/"))) ) path-out |
︙ | ︙ |
Modified sauth-common.scm from [28ffd8e69e] to [5771575e2e].
︙ | ︙ | |||
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 | (sauthorize:db-do (lambda (db) (let* ((data-row (query fetch (sql db (conc "SELECT code, exe_name, id, basepath FROM areas where areas.code = '" code "'"))))) (set! obj data-row)))) ;(print obj) obj)) ;; function to validate the users input for target path and resolve the path ;; TODO: Check for restriction in subpath (define (sauth-common:resolve-path new current allowed-sheets) (let* ((target-path (append current (string-split new "/"))) (target-path-string (string-join target-path "/")) (normal-path (normalize-pathname target-path-string)) | > > > > > > > > > | 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 | (sauthorize:db-do (lambda (db) (let* ((data-row (query fetch (sql db (conc "SELECT code, exe_name, id, basepath FROM areas where areas.code = '" code "'"))))) (set! obj data-row)))) ;(print obj) obj)) (define (sauth-common:src-size path) (let ((output (with-input-from-pipe (conc "/usr/bin/du -s " path " | awk '{print $1}'") (lambda() (read-line))))) (string->number output))) (define (sauth-common:space-left-at-dest path) (let* ((output (run/string (pipe (df ,path ) (tail -1)))) (size (caddr (cdr (string-split output " "))))) (string->number size))) ;; function to validate the users input for target path and resolve the path ;; TODO: Check for restriction in subpath (define (sauth-common:resolve-path new current allowed-sheets) (let* ((target-path (append current (string-split new "/"))) (target-path-string (string-join target-path "/")) (normal-path (normalize-pathname target-path-string)) |
︙ | ︙ | |||
277 278 279 280 281 282 283 | base-path (conc base-path "/" (string-join (cdr resolved-path) "/"))))) (if (and (not (equal? restricted-areas "" )) (string-match (regexp restrictions) target-path)) (begin | | | 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 | base-path (conc base-path "/" (string-join (cdr resolved-path) "/"))))) (if (and (not (equal? restricted-areas "" )) (string-match (regexp restrictions) target-path)) (begin (sauth:print-error (conc "Access denied to " (string-join resolved-path "/"))) ;(exit 1) #f) target-path) )) #f))) |
︙ | ︙ |
Modified spublish.scm from [0af43ce4a9] to [d0bcfc709c].
︙ | ︙ | |||
389 390 391 392 393 394 395 396 397 398 399 400 401 402 | (define (spublish:shell-cp src-path target-path) (cond ((not (file-exists? target-path)) (sauth:print-error (conc " target Directory " target-path " does not exist!!"))) ((not (file-exists? src-path)) (sauth:print-error (conc "Source path " src-path " does not exist!!" ))) (else (if (is_directory src-path) (begin (let* ((parent-dir src-path) (start-dir target-path)) (run (pipe (begin (system (conc "cd " parent-dir " ;tar chf - ." ))) (begin (change-directory start-dir) | > > > > | 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 | (define (spublish:shell-cp src-path target-path) (cond ((not (file-exists? target-path)) (sauth:print-error (conc " target Directory " target-path " does not exist!!"))) ((not (file-exists? src-path)) (sauth:print-error (conc "Source path " src-path " does not exist!!" ))) (else (if (< (sauth-common:space-left-at-dest target-path) (sauth-common:src-size src-path)) (begin (sauth:print-error "Destination does not have enough disk space.") (exit 1))) (if (is_directory src-path) (begin (let* ((parent-dir src-path) (start-dir target-path)) (run (pipe (begin (system (conc "cd " parent-dir " ;tar chf - ." ))) (begin (change-directory start-dir) |
︙ | ︙ |
Modified sretrieve.scm from [e7efdf8d00] to [15a6ca2860].
︙ | ︙ | |||
636 637 638 639 640 641 642 643 644 645 646 647 648 649 | (last-dir-name (if (pathname-extension target-path) (conc(pathname-file target-path) "." (pathname-extension target-path)) (pathname-file target-path))) (curr-dir (current-directory)) (start-dir (conc (current-directory) "/" last-dir-name)) (execlude (make-exclude-pattern (string-split restrictions ","))) (tmpfile (conc "/tmp/" (current-user-name) "/my-pipe-" (current-process-id)))) (if (file-exists? start-dir) (begin (sauth:print-error (conclast-dir-name " already exist in your work dir.")) (sauth:print-error "Nothing has been retrieved!! ")) (begin ; (sretrieve:do-as-calling-user ; (lambda () | > | 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 | (last-dir-name (if (pathname-extension target-path) (conc(pathname-file target-path) "." (pathname-extension target-path)) (pathname-file target-path))) (curr-dir (current-directory)) (start-dir (conc (current-directory) "/" last-dir-name)) (execlude (make-exclude-pattern (string-split restrictions ","))) (tmpfile (conc "/tmp/" (current-user-name) "/my-pipe-" (current-process-id)))) (sauth:print-error start-dir) (if (file-exists? start-dir) (begin (sauth:print-error (conclast-dir-name " already exist in your work dir.")) (sauth:print-error "Nothing has been retrieved!! ")) (begin ; (sretrieve:do-as-calling-user ; (lambda () |
︙ | ︙ |
Modified tests.scm from [0094b671e6] to [ef396dcaf8].
︙ | ︙ | |||
1942 1943 1944 1945 1946 1947 1948 | (lambda (count) (set! res count)) tdb "SELECT count(id) FROM test_rundat;") res)) 0) | > | > > > > > > > > > > > > > | > > | | | | | | | | < < < < < < < < < < < < < < < < < < < < < < < | 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 | (lambda (count) (set! res count)) tdb "SELECT count(id) FROM test_rundat;") res)) 0) ;; (define (tests:update-central-meta-info run-id test-id cpuload diskfree minutes uname hostname #!key (update-db #f)) (if (get-environment-variable "MT_TEST_RUN_DIR") (let* ((dest-dir (conc (get-environment-variable "MT_TEST_RUN_DIR") "/.mt_data")) (or-dash (lambda (instr)(if instr instr "-")))) (if (not (directory-exists? dest-dir))(create-directory dest-dir #t)) (let* ((outp (open-output-file (conc dest-dir "/test-run.dat") #:append))) (with-output-to-port outp (lambda () (print (current-seconds) " " (or-dash run-id) " " (or-dash test-id) " " (or-dash cpuload) " " (or-dash diskfree) " " (or-dash minutes) " " (or-dash hostname) " " (or-dash uname)))) ;; put uname last as it has spaces in it (close-output-port outp))) (begin (rmt:general-call 'update-test-rundat run-id test-id (current-seconds) (or cpuload -1)(or diskfree -1) -1 (or minutes -1)))) (if update-db (begin (if (and cpuload diskfree) (rmt:general-call 'update-cpuload-diskfree run-id cpuload diskfree test-id)) (if minutes (rmt:general-call 'update-run-duration run-id minutes test-id)) (if (and uname hostname) (rmt:general-call 'update-uname-host run-id uname hostname test-id))))) ;; This one is for running with no db access (i.e. via rmt: internally) (define (tests:set-full-meta-info db test-id run-id minutes work-area remtries #!key (update-db #f)) ;; (define (tests:set-full-meta-info test-id run-id minutes work-area) ;; (let ((remtries 10)) (let* ((cpuload (get-cpu-load)) (diskfree (get-df (current-directory))) (uname (get-uname "-srvpio")) (hostname (get-host-name))) (tests:update-central-meta-info run-id test-id cpuload diskfree minutes uname hostname update-db: update-db))) ;;====================================================================== ;; A R C H I V I N G ;;====================================================================== (define (test:archive db test-id) #f) |
︙ | ︙ |