Megatest: Diff

Differences From Artifact [89f36cc1cb]:

File runs.scm — part of check-in [0c7e3bc287] at 2022-05-13 12:16:07 on branch v1.65 — Fixed few things with the hasty implementation of global waitons. (user: mrwellan, size: 164093) [annotate] [blame] [check-ins using]

To Artifact [fe040354c9]:

File runs.scm — part of check-in [15a47376a6] at 2024-08-15 18:16:40 on branch v1.65-adjutant-patched-forward-defunct — Patched forward adjutant code (user: matt, size: 165653) [annotate] [blame] [check-ins using]

︙			︙
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73	(last-jobs-check-time 0) ) (defstruct runs:testdat hed tal reg reruns test-record test-name item-path jobgroup waitons testmode newtal itemmaps prereqs-not-met) ;; look in the $MT_RUN_AREA_HOME/.softlocks directory for key-host-pid.softlock files ;; - remove any that are over 3600 seconds old ;; - if there are any that are younger than 10 seconds ;; * sleep 10 seconds ;; * touch my key-host-pid.softlock file ;; * return ;; - if there are no files younger than 10 seconds	\| > > > > > > > > > > > > > > > > > > > > > > > > > > >	59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100	(last-jobs-check-time 0) ) (defstruct runs:testdat hed tal reg reruns test-record test-name item-path jobgroup waitons testmode newtal itemmaps prereqs-not-met) (module runsmod ( runs:wait-if-seen-recently ) (import scheme chicken data-structures extras files) (import posix typed-records srfi-18 srfi-69 md5 message-digest regex srfi-1) (define last-seen-ht (make-hash-table)) (define (runs:wait-if-seen-recently wait-until . keys) (let* ((full-key (string-intersperse keys "-")) (last-seen (hash-table-ref/default last-seen-ht full-key 0)) (now (current-seconds)) (delta (- now last-seen)) (needed (if (< delta wait-until) 0 (- wait-until delta)))) (if (> needed 0)(thread-sleep! needed)) (hash-table-set! last-seen-ht full-key (current-seconds)) needed)) ) (import runsmod) ;; look in the $MT_RUN_AREA_HOME/.softlocks directory for key-host-pid.softlock files ;; - remove any that are over 3600 seconds old ;; - if there are any that are younger than 10 seconds ;; * sleep 10 seconds ;; * touch my key-host-pid.softlock file ;; * return ;; - if there are no files younger than 10 seconds
︙			︙
490 491 492 493 494 495 496 497 498 499 500 501 502 503	;; return #t when all items in waitors-upon list are represented in test-patt, #f otherwise. (define (runs:testpatts-mention-waitors-upon? test-patt waitors-upon) (null? (tests:filter-test-names-not-matched waitors-upon test-patt))) ;;====================================================================== ;; runs:run-tests is called from megatest.scm and itself ;;====================================================================== ;; ;; test-names: Comma separated patterns same as test-patts but used in selection ;; of tests to run. The item portions are not respected.	> >	517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532	;; return #t when all items in waitors-upon list are represented in test-patt, #f otherwise. (define (runs:testpatts-mention-waitors-upon? test-patt waitors-upon) (null? (tests:filter-test-names-not-matched waitors-upon test-patt))) (define find-and-mark-incomplete-last-run (make-hash-table)) ;;====================================================================== ;; runs:run-tests is called from megatest.scm and itself ;;====================================================================== ;; ;; test-names: Comma separated patterns same as test-patts but used in selection ;; of tests to run. The item portions are not respected.
︙			︙
662 663 664 665 666 667 668 ~~669 670 671 672 673 674 675~~ 676 677 678 679 680 681 682	;; run the run prehook if there are no tests yet run for this run: ;; (runs:run-pre-hook run-id) ;; mark all test launched flag as false in the meta table (rmt:set-var (conc "lunch-complete-" run-id) "no") (debug:print-info 1 default-log-port "Setting end-of-run to no") (let* ((config-reruns (let ((x (configf:lookup configdat "setup" "reruns"))) ~~(if x (string->number x) #f))) (config-rerun-cnt (if config-reruns config-reruns 1))) (if (eq? config-rerun-cnt run-count) (rmt:set-var (conc "end-of-run-" run-id) "no")))~~ (rmt:set-run-state-status run-id "new" "n/a") ;; now add non-directly referenced dependencies (i.e. waiton) ;;====================================================================== ;; refactoring this block into tests:get-full-data ;; ;; What happended, this code is now duplicated in tests!? ;;	\| \| \| \| \| \| \|	691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711	;; run the run prehook if there are no tests yet run for this run: ;; (runs:run-pre-hook run-id) ;; mark all test launched flag as false in the meta table (rmt:set-var (conc "lunch-complete-" run-id) "no") (debug:print-info 1 default-log-port "Setting end-of-run to no") (let* ((config-reruns (let ((x (configf:lookup configdat "setup" "reruns"))) (if x (string->number x) #f))) (config-rerun-cnt (if config-reruns config-reruns 1))) (if (eq? config-rerun-cnt run-count) (rmt:set-var (conc "end-of-run-" run-id) "no"))) (rmt:set-run-state-status run-id "new" "n/a") ;; now add non-directly referenced dependencies (i.e. waiton) ;;====================================================================== ;; refactoring this block into tests:get-full-data ;; ;; What happended, this code is now duplicated in tests!? ;;
︙			︙
786 787 788 789 790 791 792 ~~793 794 795 796 797 798 799 800 801~~ 802 803 804 805 806 807 ~~808 809~~ ~~810~~ 811 812 ~~813~~ 814 ~~815 816~~ 817 818 819 820 821 822 823 824 825 826 827 828 ~~829 830~~ 831 832 833 834 835 836 837	(debug:print-info 1 default-log-port "Adding \"" (string-intersperse required-tests " ") "\" to the run queue")) ;; NOTE: these are all parent tests, items are not expanded yet. (debug:print-info 4 default-log-port "test-records=" (hash-table->alist test-records)) (let ((reglen (configf:lookup configdat "setup" "runqueue"))) (if (> (length (hash-table-keys test-records)) 0) (let* ((keep-going #t) (run-queue-retries 5) ~~#;(th1 (make-thread (lambda ()~~ ~~(handle-exceptions~~ ~~exn~~ ~~(begin~~ ~~(print-call-chain)~~ ~~(print " message: " ((condition-property-accessor 'exn 'message) exn)))~~ ~~(runs:run-tests-queue run-id runname test-records keyvals flags test-patts required-tests~~ ~~(any->number reglen) all-tests-registry)))~~ ~~"runs:run-tests-queue"))~~ (th2 (make-thread (lambda () ;; BBQ: why are we visiting ALL runs here? ;; (rmt:find-and-mark-incomplete-all-runs))))) CAN'T INTERRUPT IT ... (let ((run-ids (rmt:get-all-run-ids))) (for-each (lambda (run-id) (if keep-going (handle-exceptions ~~exn (debug:print 0 default-log-port "error in calling find-and-mark-incomplete for run-id " run-id ", exn=" exn)~~ ~~(rmt:find-and-mark-incomplete run-id #f~~)))) ;; ovr-deadtime))) ;; could be root of https://hsdes.intel.com/appstore/article/#/220546828/main -- Title: Megatest jobs show DEAD even though they are still running (1.64/27) run-ids))) "runs: mark-incompletes"))) ~~;; (thread-start! th1)~~ (thread-start! th2) ~~;; (thread-join! th1)~~ ~~;; just do the main stuff in the main thread~~ (runs:run-tests-queue run-id runname test-records keyvals flags test-patts required-tests (any->number reglen) all-tests-registry) (set! keep-going #f) (thread-join! th2) ;; if run-count > 0 call, set -preclean and -rerun STUCK/DEAD (if (> run-count 0) ;; handle reruns (begin (if (not (hash-table-ref/default flags "-preclean" #f)) (hash-table-set! flags "-preclean" #t)) (if (not (hash-table-ref/default flags "-rerun" #f)) (hash-table-set! flags "-rerun" "ABORT,STUCK/DEAD,n/a,ZERO_ITEMS")) ;; recursive call to self ~~(runs:run-tests target runname test-patts user flags run-count: (- run-count 1))) (launch:end-of-run-check run-id)))~~ (debug:print-info 0 default-log-port "No tests to run"))) (debug:print-info 4 default-log-port "All done by here") ;; TODO: try putting post hook call here ; (debug:print-info 2 default-log-port " run-count " run-count) ; (runs:run-post-hook run-id)) ; (debug:print-info 2 default-log-port "Not calling post hook runcount = " run-count ))	< < < < < < < < < \| \| > > > > > \| < < < \| \|	815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859	(debug:print-info 1 default-log-port "Adding \"" (string-intersperse required-tests " ") "\" to the run queue")) ;; NOTE: these are all parent tests, items are not expanded yet. (debug:print-info 4 default-log-port "test-records=" (hash-table->alist test-records)) (let ((reglen (configf:lookup configdat "setup" "runqueue"))) (if (> (length (hash-table-keys test-records)) 0) (let* ((keep-going #t) (run-queue-retries 5) (th2 (make-thread (lambda () ;; BBQ: why are we visiting ALL runs here? ;; (rmt:find-and-mark-incomplete-all-runs))))) CAN'T INTERRUPT IT ... (let ((run-ids (rmt:get-all-run-ids))) (for-each (lambda (run-id) (if keep-going (handle-exceptions exn (debug:print 0 default-log-port "error in calling find-and-mark-incomplete for run-id " run-id ", exn=" exn) ;; lets run this only if a run has been NOT seen for more than 900 seconds (if (> (- (current-seconds)(hash-table-ref/default find-and-mark-incomplete-last-run run-id 0)) 900) (begin (rmt:find-and-mark-incomplete run-id #f) (hash-table-set! find-and-mark-incomplete-last-run run-id (current-seconds))) )))) ;; ovr-deadtime))) ;; could be root of https://hsdes.intel.com/appstore/article/#/220546828/main -- Title: Megatest jobs show DEAD even though they are still running (1.64/27) run-ids))) "runs: mark-incompletes"))) (thread-start! th2) (runs:run-tests-queue run-id runname test-records keyvals flags test-patts required-tests (any->number reglen) all-tests-registry) (set! keep-going #f) (thread-join! th2) ;; if run-count > 0 call, set -preclean and -rerun STUCK/DEAD (if (> run-count 0) ;; handle reruns (begin (if (not (hash-table-ref/default flags "-preclean" #f)) (hash-table-set! flags "-preclean" #t)) (if (not (hash-table-ref/default flags "-rerun" #f)) (hash-table-set! flags "-rerun" "ABORT,STUCK/DEAD,n/a,ZERO_ITEMS")) ;; recursive call to self (runs:run-tests target runname test-patts user flags run-count: (- run-count 1))) (launch:end-of-run-check run-id))) (debug:print-info 0 default-log-port "No tests to run"))) (debug:print-info 4 default-log-port "All done by here") ;; TODO: try putting post hook call here ; (debug:print-info 2 default-log-port " run-count " run-count) ; (runs:run-post-hook run-id)) ; (debug:print-info 2 default-log-port "Not calling post hook runcount = " run-count ))
︙			︙
1268 1269 1270 1271 1272 1273 1274 ~~1275~~ 1276 1277 1278 1279 1280 1281 1282	;; If no resources are available just kill time and loop again ;; ((not have-resources) ;; simply try again after waiting a second (if (runs:lownoise "no resources" 60) (debug:print-info 1 default-log-port "no resources to run new tests, waiting ...")) ;; Have gone back and forth on this but db starvation is an issue. ;; wait one second before looking again to run jobs. ~~(thread-sleep! ~~0.25)~~~~ ;; could have done hed tal here but doing car/cdr of newtal to rotate tests (list (car newtal)(cdr newtal) reg reruns)) ;; This is the final stage, everything is in place so launch the test ;; ((and have-resources (or (null? prereqs-not-met)	\|	1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304	;; If no resources are available just kill time and loop again ;; ((not have-resources) ;; simply try again after waiting a second (if (runs:lownoise "no resources" 60) (debug:print-info 1 default-log-port "no resources to run new tests, waiting ...")) ;; Have gone back and forth on this but db starvation is an issue. ;; wait one second before looking again to run jobs. (thread-sleep! 1) ;; changed back to 1 from 0.25 ;; could have done hed tal here but doing car/cdr of newtal to rotate tests (list (car newtal)(cdr newtal) reg reruns)) ;; This is the final stage, everything is in place so launch the test ;; ((and have-resources (or (null? prereqs-not-met)
︙			︙
1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540	(num-retries 0) (max-retries (configf:lookup configdat "setup" "maxretries")) (max-concurrent-jobs (configf:lookup-number configdat "setup" "max_concurrent_jobs" default: 50)) (reglen (if (number? reglen-in) reglen-in 1)) (last-time-incomplete (- (current-seconds) 900)) ;; force at least one clean up cycle (last-time-some-running (current-seconds)) ;; (tdbdat (tasks:open-db)) (runsdat (make-runs:dat ;; hed: hed ;; tal: tal ;; reg: reg ;; reruns: reruns reglen: reglen regfull: #f ;; regfull	>	1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563	(num-retries 0) (max-retries (configf:lookup configdat "setup" "maxretries")) (max-concurrent-jobs (configf:lookup-number configdat "setup" "max_concurrent_jobs" default: 50)) (reglen (if (number? reglen-in) reglen-in 1)) (last-time-incomplete (- (current-seconds) 900)) ;; force at least one clean up cycle (last-time-some-running (current-seconds)) ;; (tdbdat (tasks:open-db)) (misc-data (make-hash-table)) ;; use as needed (runsdat (make-runs:dat ;; hed: hed ;; tal: tal ;; reg: reg ;; reruns: reruns reglen: reglen regfull: #f ;; regfull
︙			︙
1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 ~~1607~~ 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 ~~1619~~ 1620 1621 1622 1623 1624 1625 1626	;; (if (> (current-seconds)(+ last-time-incomplete 900)) (begin (set! last-time-incomplete (current-seconds)) ;; (rmt:find-and-mark-incomplete-all-runs) )) ;; (print "Top of loop, hed=" hed ", tal=" tal " ,reruns=" reruns) (let* ((test-record (hash-table-ref test-records hed)) (test-name (tests:testqueue-get-testname test-record)) (tconfig (tests:testqueue-get-testconfig test-record)) (jobgroup (configf:lookup tconfig "test_meta" "jobgroup")) (testmode (let ((m (configf:lookup tconfig "requirements" "mode"))) (if m (map string->symbol (string-split m)) '(normal)))) (itemmaps (tests:get-itemmaps tconfig)) ;; (configf:lookup tconfig "requirements" "itemmap")) (priority (tests:testqueue-get-priority test-record)) (itemdat (tests:testqueue-get-itemdat test-record)) ;; itemdat can be a string, list or #f (items (tests:testqueue-get-items test-record)) (item-path (item-list->path itemdat)) (tfullname (db:test-make-full-name test-name item-path)) ~~;; these are hard coded item-item waits test/item-path => test/item-path2 ...~~ (extra-waits (let* ((section (configf:get-section (tests:testqueue-get-testconfig test-record) "waitons")) (myextra (alist-ref tfullname section equal?))) (if myextra (let ((extras (string-split (car myextra)))) (if (runs:lownoise (conc tfullname "extra-waitons" tfullname) 60) (debug:print-info 0 default-log-port "HAVE EXTRA WAITONS for test " tfullname ": " myextra)) (for-each (lambda (extra) ;; (debug:print 0 default-log-port "FYI: extra = " extra " reruns = " reruns) (let ((basetestname (car (string-split extra "/")))) #;(if (not (member extra tal)) ~~(set! reruns (append tal (list extra))))~~ (if (not (member basetestname tal)) (set! reruns (append tal (list basetestname)))) )) extras) extras) '()))) (waitons (delete-duplicates (append (tests:testqueue-get-waitons test-record) extra-waits) equal?))	> > > > > > \| \|	1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655	;; (if (> (current-seconds)(+ last-time-incomplete 900)) (begin (set! last-time-incomplete (current-seconds)) ;; (rmt:find-and-mark-incomplete-all-runs) )) ;; WAIT FOR TIME ON TIGHT LOOP (if (< (- (current-milliseconds)(hash-table-ref/default misc-data "tight-loop-last-time" 0)) 100) ;; less than 1/100 second since came through the loop (thread-sleep! 0.1)) ;; wait a 1/100 seconds (hash-table-set! misc-data "tight-loop-last-time" (current-milliseconds)) ;; (print "Top of loop, hed=" hed ", tal=" tal " ,reruns=" reruns) (let* ((test-record (hash-table-ref test-records hed)) (test-name (tests:testqueue-get-testname test-record)) (tconfig (tests:testqueue-get-testconfig test-record)) (jobgroup (configf:lookup tconfig "test_meta" "jobgroup")) (testmode (let ((m (configf:lookup tconfig "requirements" "mode"))) (if m (map string->symbol (string-split m)) '(normal)))) (itemmaps (tests:get-itemmaps tconfig)) ;; (configf:lookup tconfig "requirements" "itemmap")) (priority (tests:testqueue-get-priority test-record)) (itemdat (tests:testqueue-get-itemdat test-record)) ;; itemdat can be a string, list or #f (items (tests:testqueue-get-items test-record)) (item-path (item-list->path itemdat)) (tfullname (db:test-make-full-name test-name item-path)) ;; these are hard coded item-item waits test/item-path => test/item-path2 ... (extra-waits (let* ((section (configf:get-section (tests:testqueue-get-testconfig test-record) "waitons")) (myextra (alist-ref tfullname section equal?))) (if myextra (let ((extras (string-split (car myextra)))) (if (runs:lownoise (conc tfullname "extra-waitons" tfullname) 60) (debug:print-info 0 default-log-port "HAVE EXTRA WAITONS for test " tfullname ": " myextra)) (for-each (lambda (extra) ;; (debug:print 0 default-log-port "FYI: extra = " extra " reruns = " reruns) (let ((basetestname (car (string-split extra "/")))) #;(if (not (member extra tal)) (set! reruns (append tal (list extra)))) (if (not (member basetestname tal)) (set! reruns (append tal (list basetestname)))) )) extras) extras) '()))) (waitons (delete-duplicates (append (tests:testqueue-get-waitons test-record) extra-waits) equal?))
︙			︙
1639 1640 1641 1642 1643 1644 1645 ~~1646~~ 1647 1648 1649 1650 1651 1652 1653	waitons: waitons testmode: testmode newtal: newtal itemmaps: itemmaps ;; prereqs-not-met: prereqs-not-met ))) (runs:dat-regfull-set! runsdat regfull) (if (> num-running 0) (set! last-time-some-running (current-seconds))) (if (> (current-seconds)(+ last-time-some-running (or (configf:lookup configdat "setup" "give-up-waiting") 36000))) (hash-table-set! max-tries-hash tfullname (+ (hash-table-ref/default max-tries-hash tfullname 0) 1))) ;; (debug:print 0 default-log-port "max-tries-hash: " (hash-table->alist max-tries-hash))	\|	1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682	waitons: waitons testmode: testmode newtal: newtal itemmaps: itemmaps ;; prereqs-not-met: prereqs-not-met ))) (runs:dat-regfull-set! runsdat regfull) (if (> num-running 0) (set! last-time-some-running (current-seconds))) (if (> (current-seconds)(+ last-time-some-running (or (configf:lookup configdat "setup" "give-up-waiting") 36000))) (hash-table-set! max-tries-hash tfullname (+ (hash-table-ref/default max-tries-hash tfullname 0) 1))) ;; (debug:print 0 default-log-port "max-tries-hash: " (hash-table->alist max-tries-hash))
︙			︙
1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784	;; wait for load here (if (runs:dat-load-mgmt-function runsdat)((runs:dat-load-mgmt-function runsdat))) (loop-can-run-more (runs:can-run-more-tests runsdat run-id jobgroup max-concurrent-jobs) (- remtries 1))))))) ))))) ;; I'm not clear on why prereqs are gathered here TODO: verfiy this is needed (runs:testdat-prereqs-not-met-set! testdat (rmt:get-prereqs-not-met run-id waitons hed item-path mode: testmode itemmaps: itemmaps)) ;; I'm not clear on why we'd capture running job counts here TODO: verify this is needed (runs:dat-can-run-more-tests-set! runsdat (runs:can-run-more-tests runsdat run-id jobgroup max-concurrent-jobs)) (let ((loop-list (runs:process-expanded-tests runsdat testdat))) ;; in process-expanded-tests ultimately run:test -> launch-test -> test actually running (if loop-list (apply loop loop-list))))	> > >	1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816	;; wait for load here (if (runs:dat-load-mgmt-function runsdat)((runs:dat-load-mgmt-function runsdat))) (loop-can-run-more (runs:can-run-more-tests runsdat run-id jobgroup max-concurrent-jobs) (- remtries 1))))))) ))))) ;; I'm not clear on why prereqs are gathered here TODO: verfiy this is needed (let ((waited (runs:wait-if-seen-recently 5 "prereqs-not-met" hed item-path))) ;; if we've been down this path in the past 5 seconds - wait out the difference (if (> waited 0)(debug:print 0 default-log-port "Waited for prereqs-not-met-"hed"-"item-path" for " waited "seconds."))) (runs:testdat-prereqs-not-met-set! testdat (rmt:get-prereqs-not-met run-id waitons hed item-path mode: testmode itemmaps: itemmaps)) ;; I'm not clear on why we'd capture running job counts here TODO: verify this is needed (runs:dat-can-run-more-tests-set! runsdat (runs:can-run-more-tests runsdat run-id jobgroup max-concurrent-jobs)) (let ((loop-list (runs:process-expanded-tests runsdat testdat))) ;; in process-expanded-tests ultimately run:test -> launch-test -> test actually running (if loop-list (apply loop loop-list))))
︙			︙
1850 1851 1852 1853 1854 1855 1856 ~~1857~~ 1858 1859 1860 1861 1862 1863 1864	(if loop-list (apply loop loop-list) (debug:print-info 4 default-log-port " -- Can't expand hed="hed) ) ) ;; if can't run more just loop with next possible test (loop (car newtal)(cdr newtal) reg reruns)))) ;; this case should not happen, added to help catch any bugs ((and (list? items) itemdat) (debug:print-info 4 default-log-port "cond branch - " "rtq-5") (debug:print-error 0 default-log-port "Should not have a list of items in a test and the itemspath set - please report this") (exit 1))	\|	1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896	(if loop-list (apply loop loop-list) (debug:print-info 4 default-log-port " -- Can't expand hed="hed) ) ) ;; if can't run more just loop with next possible test (loop (car newtal)(cdr newtal) reg reruns)))) ;; this case should not happen, added to help catch any bugs ((and (list? items) itemdat) (debug:print-info 4 default-log-port "cond branch - " "rtq-5") (debug:print-error 0 default-log-port "Should not have a list of items in a test and the itemspath set - please report this") (exit 1))
︙			︙
1884 1885 1886 1887 1888 1889 1890 ~~1891~~ 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 ~~1906~~ 1907 ~~1908 1909 1910 1911~~ ~~1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924~~ 1925 1926 1927 1928 1929 1930 1931	(loop (car reg)(cdr reg) '() reruns)) (else (debug:print-info 4 default-log-port "cond branch - " "rtq-9") (debug:print-info 4 default-log-port "Exiting loop with...\n hed=" hed "\n tal=" tal "\n reruns=" reruns)) ))) ;; end loop on sorted test names ;; this is the point where everything is launched and now you can mark the run in metadata table as all launched (rmt:set-var (conc "lunch-complete-" run-id) "yes") ;; now if -run-wait we wait for all tests to be done ;; Now wait for any RUNNING tests to complete (if in run-wait mode) ;; (if (runs:dat-load-mgmt-function runsdat)((runs:dat-load-mgmt-function runsdat))) (thread-sleep! 10) ;; I think there is a race condition here. Let states/statuses settle (let wait-loop ((num-running (rmt:get-count-tests-running-for-run-id run-id)) (prev-num-running 0)) ;; (debug:print-info 13 default-log-port "num-running=" num-running ", prev-num-running=" prev-num-running) (if (and (or (args:get-arg "-run-wait") (equal? (configf:lookup configdat "setup" "run-wait") "yes")) (> num-running 0)) (begin ;; Here we mark any old defunct tests as incomplete. Do this every fifteen minutes ;; (debug:print 0 default-log-port "Got here eh! num-running=" num-running " (> num-running 0) " (> num-running 0)) ~~(if (> (current-seconds)(~~+ l~~ast-t~~ime~~-incomplete ~~900)~~)~~ (let ((actual-num-running (rmt:get-count-tests-running-for-run-id run-id))) (debug:print-info 0 default-log-port "Marking stuck tests as INCOMPLETE while waiting for run " run-id ". Running as pid " (current-process-id) " on " (get-host-name)) (set! last-time-incomplete (current-seconds)) ;; FIXME, this might be causing slow down - use of set! (rmt:find-and-mark-incomplete run-id #f) (debug:print-info 0 default-log-port "run-wait specified, waiting on " actual-num-running " tests in RUNNING, REMOTEHOSTSTART or LAUNCHED state at " (time->string (seconds->local-time (current-seconds)))))) ;; (if (runs:dat-load-mgmt-function runsdat)((runs:dat-load-mgmt-function runsdat))) (thread-sleep! 5) ;; (if (>= num-running max-concurrent-jobs) 5 1)) (wait-loop (rmt:get-count-tests-running-for-run-id run-id) num-running)))) ;; LET* ((test-record ;; we get here on "drop through". All done! ;; this is moved to runs:run-testes since this function is getting called twice to ensure everthing is completed. ;; (debug:print-info 0 default-log-port "Calling Post Hook") ;; (runs:run-post-hook run-id) (debug:print-info 1 default-log-port "All tests launched"))) (define (runs:calc-fails prereqs-not-met) (filter (lambda (test) (and (vector? test) ;; not (string? test)) (member (db:test-get-state test) '("INCOMPLETE" "COMPLETED")) ;; TODO: pull from common:stuff... (not (member (db:test-get-status test) '("PASS" "WARN" "WAIVED" "SKIP")))))	\| \| > \| \| \| \| > \| \| \| \| \| \| \| \| \| \| \| \| \|	1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965	(loop (car reg)(cdr reg) '() reruns)) (else (debug:print-info 4 default-log-port "cond branch - " "rtq-9") (debug:print-info 4 default-log-port "Exiting loop with...\n hed=" hed "\n tal=" tal "\n reruns=" reruns)) ))) ;; end loop on sorted test names ;; this is the point where everything is launched and now you can mark the run in metadata table as all launched (rmt:set-var (conc "lunch-complete-" run-id) "yes") ;; now if -run-wait we wait for all tests to be done ;; Now wait for any RUNNING tests to complete (if in run-wait mode) ;; (if (runs:dat-load-mgmt-function runsdat)((runs:dat-load-mgmt-function runsdat))) (thread-sleep! 10) ;; I think there is a race condition here. Let states/statuses settle (let wait-loop ((num-running (rmt:get-count-tests-running-for-run-id run-id)) (prev-num-running 0)) ;; (debug:print-info 13 default-log-port "num-running=" num-running ", prev-num-running=" prev-num-running) (if (and (or (args:get-arg "-run-wait") (equal? (configf:lookup configdat "setup" "run-wait") "yes")) (> num-running 0)) (begin ;; Here we mark any old defunct tests as incomplete. Do this every fifteen minutes ;; (debug:print 0 default-log-port "Got here eh! num-running=" num-running " (> num-running 0) " (> num-running 0)) (if (> (- (current-seconds)(hash-table-ref/default find-and-mark-incomplete-last-run run-id 0)) 900) (let ((actual-num-running (rmt:get-count-tests-running-for-run-id run-id))) (let ((actual-num-running num-running)) ;; (rmt:get-count-tests-running-for-run-id run-id))) ;; why call it again? (debug:print-info 0 default-log-port "Marking stuck tests as INCOMPLETE while waiting for run " run-id ". Running as pid " (current-process-id) " on " (get-host-name)) ;; (set! last-time-incomplete (current-seconds)) ;; FIXME, this might be causing slow down - use of set! (rmt:find-and-mark-incomplete run-id #f) (hash-table-set! find-and-mark-incomplete-last-run run-id (current-seconds)) (debug:print-info 0 default-log-port "run-wait specified, waiting on " actual-num-running " tests in RUNNING, REMOTEHOSTSTART or LAUNCHED state at " (time->string (seconds->local-time (current-seconds))))) ;; (if (runs:dat-load-mgmt-function runsdat)((runs:dat-load-mgmt-function runsdat))) (thread-sleep! 5)) ;; (if (>= num-running max-concurrent-jobs) 5 1)) (wait-loop (rmt:get-count-tests-running-for-run-id run-id) num-running))))) ;; LET* ((test-record ;; we get here on "drop through". All done! ;; this is moved to runs:run-testes since this function is getting called twice to ensure everthing is completed. ;; (debug:print-info 0 default-log-port "Calling Post Hook") ;; (runs:run-post-hook run-id) (debug:print-info 1 default-log-port "All tests launched"))) (define (runs:calc-fails prereqs-not-met) (filter (lambda (test) (and (vector? test) ;; not (string? test)) (member (db:test-get-state test) '("INCOMPLETE" "COMPLETED")) ;; TODO: pull from common:stuff... (not (member (db:test-get-status test) '("PASS" "WARN" "WAIVED" "SKIP")))))
︙			︙