Overview
Comment: | wip |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | v1.80-revolution-remodularization |
Files: | files | file ages | folders |
SHA1: |
a6f9565756ebe0d7df57d7963a11bb50 |
User & Date: | matt on 2024-02-03 17:53:40 |
Other Links: | branch diff | manifest | tags |
Context
2024-02-03
| ||
18:30 | Compiles check-in: 5b8df5d5d2 user: matt tags: v1.80-revolution-remodularization | |
17:53 | wip check-in: a6f9565756 user: matt tags: v1.80-revolution-remodularization | |
2024-02-02
| ||
20:54 | wip check-in: c1b66a127b user: matt tags: v1.80-revolution-remodularization | |
Changes
Modified Makefile from [69312e8b25] to [6a96adb19b].
︙ | ︙ | |||
57 58 59 60 61 62 63 | process.o : mofiles/processmod.o mofiles/configfmod.o : mofiles/processmod.o mofiles/processmod.o : mofiles/commonmod.o mofiles/servermod.o : mofiles/commonmod.o mofiles/rmtmod.o : mofiles/mtmod.o mofiles/apimod.o mofiles/dbmod.o : mofiles/mtmod.o # mofiles/mtmod.o : mofiles/tcp-transportmod.o | | < > | 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | process.o : mofiles/processmod.o mofiles/configfmod.o : mofiles/processmod.o mofiles/processmod.o : mofiles/commonmod.o mofiles/servermod.o : mofiles/commonmod.o mofiles/rmtmod.o : mofiles/mtmod.o mofiles/apimod.o mofiles/dbmod.o : mofiles/mtmod.o # mofiles/mtmod.o : mofiles/tcp-transportmod.o mofiles/megatestmod.o : mofiles/pkts.o mofiles/servermod.o mofiles/mtmod.o : mofiles/testsmod.o mofiles/dbfile.o : \ mofiles/debugprint.o mofiles/commonmod.o mofiles/configfmod.o mofiles/apimod.o : mofiles/commonmod.o mofiles/tcp-transportmod.o mofiles/configfmod.o mofiles/megatestmod.o mofiles/dbmod.o : mofiles/dbfile.o mofiles/api.o : mofiles/apimod.o mofiles/commonmod.o : mofiles/debugprint.o mofiles/stml2.o configf.o : commonmod.import.o mofiles/dbfile.o : mofiles/debugprint.o mofiles/rmtmod.o mofiles/dbmod.o : mofiles/dbfile.o mofiles/commonmod.o mofiles/debugprint.o db.o : mofiles/dbmod.o mofiles/dbfile.o mofiles/debugprint.o : mofiles/mtargs.o mofiles/tcp-transportmod.o : mofiles/portlogger.o mofiles/tasksmod.o : mofiles/rmtmod.o # ftail.scm rmtmod.scm commonmod.scm removed # MSRCFILES = ducttape-lib.scm pkts.scm stml2.scm cookie.scm mutils.scm \ # mtargs.scm commonmod.scm dbmod.scm adjutant.scm ulex.scm \ # rmtmod.scm apimod.scm GUISRCF = dashboard-context-menu.scm dashboard-tests.scm \ |
︙ | ︙ |
Modified dashboard.scm from [fc7085a8f8] to [c2284605cc].
︙ | ︙ | |||
3853 3854 3855 3856 3857 3858 3859 | (if (not (launch:setup)) (begin (debug:print 0 *default-log-port* "Failed to find megatest.config, exiting") (exit 1) ) ) | | | 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 | (if (not (launch:setup)) (begin (debug:print 0 *default-log-port* "Failed to find megatest.config, exiting") (exit 1) ) ) #;(if (not (rmt:on-homehost?)) (begin (debug:print 0 *default-log-port* "WARNING: You are starting the dashboard on a machine that is not the homehost:" (server:get-homehost)) (debug:print 0 *default-log-port* "It will be slower.") )) (if (and (common:file-exists? mtdb-path) |
︙ | ︙ |
Modified launch.scm from [93ed8cfb9c] to [0c556cbd21].
︙ | ︙ | |||
1771 1772 1773 1774 1775 1776 1777 | (read-symbolic-link (conc "/proc/" pid "/cwd")) #f))) ;; now wait on that process if all is correct ;; periodically update the db with runtime ;; when the process exits look at the db, if still RUNNING after 10 seconds set ;; state/status appropriately (process-wait pid))) | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 | (read-symbolic-link (conc "/proc/" pid "/cwd")) #f))) ;; now wait on that process if all is correct ;; periodically update the db with runtime ;; when the process exits look at the db, if still RUNNING after 10 seconds set ;; state/status appropriately (process-wait pid))) ;;====================================================================== ;; Maintenance ;;====================================================================== (define (rmt:find-and-mark-incomplete run-id ovr-deadtime) (let* ((cfg-deadtime (configf:lookup-number *configdat* "setup" "deadtime")) (test-stats-update-period (configf:lookup-number *configdat* "setup" "test-stats-update-period"))) (rmt:find-and-mark-incomplete-engine run-id ovr-deadtime cfg-deadtime test-stats-update-period) ;;call end of eud of run detection for posthook (launch:end-of-run-check run-id))) ;; select end_time-now from ;; (select testname,item_path,event_time+run_duration as ;; end_time,strftime('%s','now') as now from tests where state in ;; ('RUNNING','REMOTEHOSTSTART','LAUNCHED')); ;; ;; NOT EASY TO MIGRATE TO db{file,mod} ;; (define (rmt:find-and-mark-incomplete-engine run-id ovr-deadtime cfg-deadtime test-stats-update-period) (let* ((incompleted '()) (oldlaunched '()) (toplevels '()) ;; The default running-deadtime is 720 seconds = 12 minutes. ;; "(running-deadtime-default (+ server-start-allowance (* 2 launch-monitor-period)))" = 200 + (2 * (200 + 30 + 30)) (deadtime-trim (or ovr-deadtime cfg-deadtime)) (server-start-allowance 200) (server-overloaded-budget 200) (launch-monitor-off-time (or test-stats-update-period 30)) (launch-monitor-on-time-budget 30) (launch-monitor-period (+ launch-monitor-off-time launch-monitor-on-time-budget server-overloaded-budget)) (remotehoststart-deadtime-default (+ server-start-allowance server-overloaded-budget 30)) (remotehoststart-deadtime (or deadtime-trim remotehoststart-deadtime-default)) (running-deadtime-default (+ server-start-allowance (* 2 launch-monitor-period))) (running-deadtime (or deadtime-trim running-deadtime-default))) ;; two minutes (30 seconds between updates, this leaves 3x grace period) (debug:print-info 4 *default-log-port* "running-deadtime = " running-deadtime) (debug:print-info 4 *default-log-port* "deadtime-trim = " deadtime-trim) (let* ((dat (rmt:get-toplevels-and-incompletes run-id running-deadtime remotehoststart-deadtime))) (set! oldlaunched (list-ref dat 1)) (set! toplevels (list-ref dat 2)) (set! incompleted (list-ref dat 0))) (debug:print-info 18 *default-log-port* "Found " (length oldlaunched) " old LAUNCHED items, " (length toplevels) " old LAUNCHED toplevel tests and " (length incompleted) " tests marked RUNNING but apparently dead.") ;; These are defunct tests, do not do all the overhead of set-state-status. Force them to INCOMPLETE. ;; ;; (db:delay-if-busy dbdat) (let* ((min-incompleted-ids (map car incompleted)) ;; do 'em all (all-ids (append min-incompleted-ids (map car oldlaunched)))) (if (> (length all-ids) 0) (begin ;; (launch:is-test-alive "localhost" 435) (debug:print 0 *default-log-port* "WARNING: Marking test(s); " (string-intersperse (map conc all-ids) ", ") " as DEAD") (for-each (lambda (test-id) (let* ((tinfo (rmt:get-test-info-by-id run-id test-id)) (run-dir (db:test-get-rundir tinfo)) (host (db:test-get-host tinfo)) (pid (db:test-get-process_id tinfo)) (result (rmt:get-status-from-final-status-file run-dir))) (if (and (list? result) (> (length result) 1) (equal? "PASS" (cadr result)) (equal? "COMPLETED" (car result))) (begin (debug:print 0 *default-log-port* "INFO: test " test-id " actually passed, so marking PASS not DEAD") (rmt:set-state-status-and-roll-up-items run-id test-id 'foo "COMPLETED" "PASS" "Test stopped responding but it has PASSED; marking it PASS in the DB.")) (let ((is-alive (and (not (eq? pid 0)) ;; 0 is default in re-used field "attemptnum" where pid stored. (commonmod:is-test-alive host pid)))) (if is-alive (debug:print 0 *default-log-port* "INFO: test " test-id " on host " host " has a process on pid " pid ", NOT setting to DEAD.") (begin (debug:print 0 *default-log-port* "INFO: test " test-id " final state/status is not COMPLETED/PASS. It is " result) (rmt:set-state-status-and-roll-up-items run-id test-id 'foo "COMPLETED" "DEAD" "Test stopped responding while in RUNNING or REMOTEHOSTSTART; presumed dead."))))))) ;; call end of eud of run detection for posthook - from merge, is it needed? ;; (launch:end-of-run-check run-id) all-ids) ))))) |
Modified megatestmod.scm from [f8d653481a] to [9862464d20].
︙ | ︙ | |||
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | (declare (uses dbmod)) (declare (uses dbfile)) (declare (uses commonmod)) (declare (uses configfmod)) (declare (uses processmod)) (declare (uses mtmod)) (declare (uses pkts)) (use srfi-69) (module megatestmod * (import scheme) (cond-expand (chicken-4 (import chicken ports (prefix base64 base64:) (prefix sqlite3 sqlite3:) data-structures extras files | > > < < < | < < < < < < < < | | 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | (declare (uses dbmod)) (declare (uses dbfile)) (declare (uses commonmod)) (declare (uses configfmod)) (declare (uses processmod)) (declare (uses mtmod)) (declare (uses pkts)) (declare (uses servermod)) (declare (uses dbi)) (use srfi-69) (module megatestmod * (import scheme) (cond-expand (chicken-4 (import chicken ports (prefix base64 base64:) (prefix sqlite3 sqlite3:) data-structures extras files pathname-expand posix posix-extras (prefix dbi dbi:) directory-utils ) (use srfi-69)) (chicken-5 (import (prefix sqlite3 sqlite3:) ;; data-structures ;; extras ;; files |
︙ | ︙ | |||
97 98 99 100 101 102 103 | md5 message-digest pathname-expand system-information ))) | | > > > > > > < < > > < < | > > > | 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | md5 message-digest pathname-expand system-information ))) (import call-with-environment-variables matchable md5 message-digest regex regex-case sparse-vectors srfi-1 srfi-13 srfi-18 srfi-69 typed-records z3 (prefix mtargs args:) commonmod configfmod dbfile dbmod debugprint mtmod pkts processmod servermod ) (define read-config (lambda ()(assert #f "FATAL: read-config proc not set!"))) (define (read-config-set! proc) (set! read-config proc)) |
︙ | ︙ | |||
312 313 314 315 316 317 318 | #t)))) ;; default to requiring server (if force-result (begin (debug:print-info 0 *default-log-port* "ATTENTION! Forcing use of server, force setting is \"" force-setting "\".") #t) #f))) | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 310 311 312 313 314 315 316 317 318 319 320 321 322 323 | #t)))) ;; default to requiring server (if force-result (begin (debug:print-info 0 *default-log-port* "ATTENTION! Forcing use of server, force setting is \"" force-setting "\".") #t) #f))) ;; -mrw- this appears to not be used ;; ;; (define (common:print-delay-table) ;; (let loop ((x 0)) ;; (print x "," (common:get-delay x 1)) ;; (if (< x 2) ;; (loop (+ x 0.1))))) |
︙ | ︙ | |||
366 367 368 369 370 371 372 | ;; (if match ;; (let ((newval (string->number (cadr match)))) ;; (if (number? newval) ;; (set! cpu-load newval)))))) ;; (car load-res)) ;; cpu-load)) | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 334 335 336 337 338 339 340 341 342 343 344 345 346 347 | ;; (if match ;; (let ((newval (string->number (cadr match)))) ;; (if (number? newval) ;; (set! cpu-load newval)))))) ;; (car load-res)) ;; cpu-load)) ;;====================================================================== ;; given path get free space, allows override in [setup] ;; with free-space-script /path/to/some/script.sh ;; (define (get-df path) (if (configf:lookup *configdat* "setup" "free-space-script") (with-input-from-pipe |
︙ | ︙ | |||
542 543 544 545 546 547 548 | ;;(print "Processing: " disk-num " bestsize: " bestsize " best: " best " freespc: " freespc " min-inodes: " min-inodes " free-inodes: " free-inodes) )) (map car disks)) (if (and best (> bestsize minsize)) best #f))) ;; #f means no disk candidate found | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 444 445 446 447 448 449 450 451 452 453 454 455 456 457 | ;;(print "Processing: " disk-num " bestsize: " bestsize " best: " best " freespc: " freespc " min-inodes: " min-inodes " free-inodes: " free-inodes) )) (map car disks)) (if (and best (> bestsize minsize)) best #f))) ;; #f means no disk candidate found (define (common:get-pkts-dirs mtconf use-lt) (let* ((pktsdirs-str (or (configf:lookup mtconf "setup" "pktsdirs") (and use-lt (conc (or *toppath* (current-directory)) "/lt/.pkts")))) (pktsdirs (if pktsdirs-str |
︙ | ︙ | |||
739 740 741 742 743 744 745 | (let* ((uname-res (process:cmd-run->list (conc "uname " (if (null? params) "-a" (car params))))) (uname #f)) (if (null? (car uname-res)) "unknown" (caar uname-res)))) | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 567 568 569 570 571 572 573 574 575 576 577 578 579 580 | (let* ((uname-res (process:cmd-run->list (conc "uname " (if (null? params) "-a" (car params))))) (uname #f)) (if (null? (car uname-res)) "unknown" (caar uname-res)))) ;;====================================================================== ;; use-lt is use linktree "lt" link to find pkts dir (define (common:save-pkt pktalist-in mtconf use-lt #!key (add-only #f)) ;; add-only saves the pkt only if there is a parent already (if (or (not add-only) (hash-table-exists? *pkts-info* 'last-parent)) (let* ((parent (hash-table-ref/default *pkts-info* 'last-parent #f)) (pktalist (if parent |
︙ | ︙ | |||
1126 1127 1128 1129 1130 1131 1132 | ((or (list? items)(list? itemstable)) ;; calc now (debug:print-info 4 *default-log-port* "items and itemstable are lists, calc now\n" " items: " items " itemstable: " itemstable) (items:get-items-from-config tconfig)) (else #f)))) ;; not iterated | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 920 921 922 923 924 925 926 927 928 929 930 931 932 933 | ((or (list? items)(list? itemstable)) ;; calc now (debug:print-info 4 *default-log-port* "items and itemstable are lists, calc now\n" " items: " items " itemstable: " itemstable) (items:get-items-from-config tconfig)) (else #f)))) ;; not iterated ;; given waiting-test that is waiting on waiton-test extend test-patt appropriately ;; ;; genlib/testconfig sim/testconfig ;; genlib/sch sim/sch/cell1 ;; ;; [requirements] [requirements] ;; mode itemwait |
︙ | ︙ | |||
1241 1242 1243 1244 1245 1246 1247 | (else ;; not waiting on items, waiting on entire waiton test. (let* ((patts (string-split test-patt ",")) (new-patts (if (member waiton-test patts) patts (cons waiton-test patts)))) (string-intersperse (delete-duplicates new-patts) ","))))) | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 976 977 978 979 980 981 982 983 | (else ;; not waiting on items, waiting on entire waiton test. (let* ((patts (string-split test-patt ",")) (new-patts (if (member waiton-test patts) patts (cons waiton-test patts)))) (string-intersperse (delete-duplicates new-patts) ","))))) ) |
Modified rmtmod.scm from [f29d960503] to [21ad49452d].
︙ | ︙ | |||
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | ;; along with Megatest. If not, see <http://www.gnu.org/licenses/>. ;;====================================================================== (declare (unit rmtmod)) (declare (uses debugprint)) (declare (uses commonmod)) (declare (uses dbfile)) ;; needed for records (declare (uses dbmod)) (declare (uses mtmod)) (declare (uses tcp-transportmod)) (declare (uses apimod)) (module rmtmod * | > > | > > > > > > > | > > > > | > > | 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | ;; along with Megatest. If not, see <http://www.gnu.org/licenses/>. ;;====================================================================== (declare (unit rmtmod)) (declare (uses debugprint)) (declare (uses commonmod)) (declare (uses configfmod)) (declare (uses dbfile)) ;; needed for records (declare (uses dbmod)) (declare (uses mtmod)) (declare (uses tcp-transportmod)) (declare (uses apimod)) (declare (uses servermod)) (module rmtmod * (import scheme chicken data-structures regex extras matchable srfi-1 srfi-69 (prefix sqlite3 sqlite3:) posix typed-records srfi-18) (import commonmod configfmod tcp-transportmod dbfile dbmod debugprint apimod mtmod servermod ) (include "db_records.scm") (defstruct alldat (areapath #f) (ulexdat #f) ) |
︙ | ︙ | |||
220 221 222 223 224 225 226 | ) (let ((res (with-input-from-file infile read-lines))) (if (null? res) #f res))))) ;; (string-split (car res))))))) <== I would have preferred a single line STATE STATUS without "'s ;; (string-split (car res))))))) ;; DUNNO WHICH IS CORRECT | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 235 236 237 238 239 240 241 242 243 244 245 246 247 248 | ) (let ((res (with-input-from-file infile read-lines))) (if (null? res) #f res))))) ;; (string-split (car res))))))) <== I would have preferred a single line STATE STATUS without "'s ;; (string-split (car res))))))) ;; DUNNO WHICH IS CORRECT ;;====================================================================== (define *send-receive-mutex* (make-mutex)) ;; should have separate mutex per run-id (define *ttdat* #f) ;; NB// area-dat replaced by ttdat ;; |
︙ | ︙ | |||
996 997 998 999 1000 1001 1002 | (begin (tt-ro-mode-set! runremote ro-mode) (tt-ro-mode-checked-set! runremote #t) ro-mode) ro-mode)))))) | < < < < < < < < < < < | 936 937 938 939 940 941 942 943 944 945 946 947 948 949 | (begin (tt-ro-mode-set! runremote ro-mode) (tt-ro-mode-checked-set! runremote #t) ro-mode) ro-mode)))))) ;;====================================================================== ;; S U P P O R T F U N C T I O N S ;;====================================================================== (define (rmt:on-homehost? runremote) (let* ((hh-dat (remote-hh-dat runremote))) (if (pair? hh-dat) |
︙ | ︙ | |||
1069 1070 1071 1072 1073 1074 1075 | (not (equal? (common:get-last-run-version) (common:version-signature)))) ;; From 1.70 to 1.80, db's are compatible. (define (common:api-changed?) | < | | < | | < < | 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 | (not (equal? (common:get-last-run-version) (common:version-signature)))) ;; From 1.70 to 1.80, db's are compatible. (define (common:api-changed?) (let* ((megatest-major-version (substring (->string megatest-version) 0 4)) (run-major-version (substring (conc (common:get-last-run-version)) 0 4))) (and (not (equal? megatest-major-version "1.80")) (not (equal? megatest-major-version run-major-version))))) ;;====================================================================== ;; Move me elsewhere ... ;; RADT => Why do we meed the version check here, this is called only if version misma ;; (define (common:cleanup-db dbstruct #!key (full #f)) (case (rmt:transport-mode) |
︙ | ︙ | |||
1216 1217 1218 1219 1220 1221 1222 | (define (mt:test-set-state-status-by-testname run-id test-name item-path new-state new-status new-comment) ;(let ((test-id (rmt:get-test-id run-id test-name item-path))) (rmt:set-state-status-and-roll-up-items run-id test-name item-path new-state new-status new-comment) ;; (mt:process-triggers run-id test-id new-state new-status) #t);) ;;(mt:test-set-state-status-by-id run-id test-id new-state new-status new-comment))) | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 | (define (mt:test-set-state-status-by-testname run-id test-name item-path new-state new-status new-comment) ;(let ((test-id (rmt:get-test-id run-id test-name item-path))) (rmt:set-state-status-and-roll-up-items run-id test-name item-path new-state new-status new-comment) ;; (mt:process-triggers run-id test-id new-state new-status) #t);) ;;(mt:test-set-state-status-by-id run-id test-id new-state new-status new-comment))) (define (tests:test-set-toplog! run-id test-name logf) (rmt:general-call 'tests:test-set-toplog run-id logf run-id test-name)) ) |
Modified runs.scm from [a0e38aa67a] to [4673823194].
︙ | ︙ | |||
1196 1197 1198 1199 1200 1201 1202 | ;; (if (runs:dat-load-mgmt-function runsdat) ((runs:dat-load-mgmt-function runsdat)) (runs:dat-load-mgmt-function-set! runsdat (lambda () ;; jobtools maxload is useful for where the full Megatest run is done on one machine | | | 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 | ;; (if (runs:dat-load-mgmt-function runsdat) ((runs:dat-load-mgmt-function runsdat)) (runs:dat-load-mgmt-function-set! runsdat (lambda () ;; jobtools maxload is useful for where the full Megatest run is done on one machine (if (and (not (rmt:on-homehost?)) maxload) ;; only gate if maxload is specified, NOTE: maxload is normalized, i.e. load=1 means all cpus fully utilized (common:wait-for-normalized-load maxload "Waiting for load to drop before starting more tests" #f)) ;; jobtools maxhomehostload is intended to prevent overloading on the homehost which can cause database corruption issues (if maxhomehostload (common:wait-for-homehost-load maxhomehostload (conc "Waiting for homehost load to drop below normalized value of " maxhomehostload)))))) |
︙ | ︙ |
Modified servermod.scm from [e690f680a7] to [cc0c6e8294].
︙ | ︙ | |||
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | ;; along with Megatest. If not, see <http://www.gnu.org/licenses/>. ;; (declare (unit servermod)) (declare (uses commonmod)) (declare (uses configfmod)) (declare (uses debugprint)) (declare (uses mtargs)) (module servermod * (use (srfi 18) extras s11n) (use srfi-1 posix regex regex-case srfi-69 hostinfo md5 message-digest) (use directory-utils posix-extras matchable utils) (use spiffy uri-common intarweb http-client spiffy-request-vars) | > > > > | > > > > > > | > > | 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | ;; along with Megatest. If not, see <http://www.gnu.org/licenses/>. ;; (declare (unit servermod)) (declare (uses commonmod)) (declare (uses configfmod)) (declare (uses mtmod)) (declare (uses debugprint)) (declare (uses mtargs)) (module servermod * (import scheme chicken) (use (srfi 18) extras s11n) (use srfi-1 posix regex regex-case srfi-69 hostinfo md5 message-digest) (use directory-utils posix-extras matchable utils) (use spiffy uri-common intarweb http-client spiffy-request-vars) (import ports data-structures files srfi-4 typed-records commonmod configfmod debugprint (prefix mtargs args:) mtmod ) (include "common_records.scm") (include "db_records.scm") (define (server:make-server-url hostport) (if (not hostport) #f |
︙ | ︙ | |||
157 158 159 160 161 162 163 | (debug:print 0 *default-log-port* "INFO: starting server at " (common:human-time)) (system (conc "nbfake " cmdln)) (unsetenv "TARGETHOST_LOGF") ;; (if (get-environment-variable "TARGETHOST")(unsetenv "TARGETHOST")) (thread-join! log-rotate) (pop-directory))) | < < < < | 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | (debug:print 0 *default-log-port* "INFO: starting server at " (common:human-time)) (system (conc "nbfake " cmdln)) (unsetenv "TARGETHOST_LOGF") ;; (if (get-environment-variable "TARGETHOST")(unsetenv "TARGETHOST")) (thread-join! log-rotate) (pop-directory))) (define (server:logf-get-start-info logf) (let ((server-rx (regexp "^SERVER STARTED: (\\S+):(\\d+) AT ([\\d\\.]+) server-id: (\\S+) pid: (\\d+)")) ;; SERVER STARTED: host:port AT timesecs server id (dbprep-rx (regexp "^SERVER: dbprep")) (dbprep-found 0) (bad-dat (list #f #f #f #f #f))) (handle-exceptions |
︙ | ︙ | |||
197 198 199 200 201 202 203 | ((_ host port start server-id pid) (list host (string->number port) (string->number start) server-id (string->number pid))) (else | | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 | ((_ host port start server-id pid) (list host (string->number port) (string->number start) server-id (string->number pid))) (else (debug:print 0 *default-log-port* "ERROR: did not recognise SERVER line info "mlst) bad-dat)))) (begin (if dbprep-found (begin (debug:print-info 2 *default-log-port* "Server is in dbprep at " (common:human-time)) (thread-sleep! 0.5)) ;; was 25 sec but that blocked things from starting? (debug:print-info 0 *default-log-port* "Unable to get server info from " logf " at " (seconds->time-string (current-seconds)))) bad-dat)))))))) (define (server:record->id servr) (handle-exceptions exn (begin (debug:print-info 0 *default-log-port* "Unable to get server id from " servr ", exn=" exn) #f) |
︙ | ︙ | |||
362 363 364 365 366 367 368 | #f) (match-let (((host port start-time server-id pid) servr)) (if (and host port) (conc host ":" port) #f)))) | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 240 241 242 243 244 245 246 247 248 249 250 251 252 253 | #f) (match-let (((host port start-time server-id pid) servr)) (if (and host port) (conc host ":" port) #f)))) ;; oldest server alive determines host then choose random of youngest ;; five servers on that host ;; (define (server:get-servers-info areapath) ;; (assert *toppath* "FATAL: server:get-servers-info called before *toppath* has been set.") (let* ((servinfodir (server:get-servinfo-dir areapath))) ;; (conc *toppath*"/.servinfo"))) (if (not (file-exists? servinfodir)) |
︙ | ︙ | |||
562 563 564 565 566 567 568 | (debug:print 0 *default-log-port* "WARNING: found old server info file "sfile", removing it.") (handle-exceptions exn (debug:print 0 *default-log-port* "WARNING: failed to delete old server info file "sfile) (delete-file sfile)))))) sfiles))) | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 407 408 409 410 411 412 413 414 415 416 417 418 419 420 | (debug:print 0 *default-log-port* "WARNING: found old server info file "sfile", removing it.") (handle-exceptions exn (debug:print 0 *default-log-port* "WARNING: failed to delete old server info file "sfile) (delete-file sfile)))))) sfiles))) ;; timeout is hms string: 1h 5m 3s, default is 1 minute ;; This is currently broken. Just use the number of hours with no unit. ;; Default is 600 seconds. ;; (define (server:expiration-timeout) (let* ((tmo (configf:lookup *configdat* "server" "timeout"))) (if (string? tmo) |
︙ | ︙ | |||
873 874 875 876 877 878 879 | (handle-exceptions exn (debug:print-error 0 *default-log-port* "failed to remove " fullname ", exn=" exn) (delete-file* fullname))))) files) (debug:print-info 0 *default-log-port* "Deleted " (length files) " files from logs, keeping " max-allowed " files.")))))) | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 | (handle-exceptions exn (debug:print-error 0 *default-log-port* "failed to remove " fullname ", exn=" exn) (delete-file* fullname))))) files) (debug:print-info 0 *default-log-port* "Deleted " (length files) " files from logs, keeping " max-allowed " files.")))))) ;;====================================================================== ;; E X I T H A N D L I N G ;;====================================================================== (define (std-signal-handler signum) ;; (signal-mask! signum) (set! *time-to-exit* #t) ;;(debug:print-info 13 *default-log-port* "got signal "signum) (debug:print-error 0 *default-log-port* "Received signal " signum " aaa exiting promptly") ;; (std-exit-procedure) ;; shouldn't need this since we are exiting and it will be called anyway |
︙ | ︙ | |||
971 972 973 974 975 976 977 | ;; (std-exit-procedure) ;; shouldn't need this since we are exiting and it will be called anyway (exit)) (set-signal-handler! signal/int std-signal-handler) ;; ^C (set-signal-handler! signal/term std-signal-handler) ;;====================================================================== | > > > | < < | < | | | | | | < | | > | < | | < < > | | | > > | < < | < < | < < < < < < < < < < < < < < < | < < < < < < < < | 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 | ;; (std-exit-procedure) ;; shouldn't need this since we are exiting and it will be called anyway (exit)) (set-signal-handler! signal/int std-signal-handler) ;; ^C (set-signal-handler! signal/term std-signal-handler) ;;====================================================================== ;; calculate a delay number based on a droop curve ;; inputs are: ;; - load-in, load as from uptime, NOT normalized ;; - numcpus, number of cpus, ideally use the real cpus, not threads ;; (define (common:get-delay load-in numcpus) (let* ((ratio (/ load-in numcpus)) (new-option (configf:lookup *configdat* "load" "new-load-method")) (paramstr (or (configf:lookup *configdat* "load" "exp-params") "15 12 1281453987.9543 0.75")) ;; 5 4 10 1")) (paramlst (map string->number (string-split paramstr)))) (if new-option (begin (cond ((and (>= ratio 0) (< ratio .5)) 0) ((and (>= ratio 0.5) (<= ratio .9)) (* ratio (/ 5 .9))) ((and (> ratio .9) (<= ratio 1.1)) (+ 5 (* (- ratio .9) (/ 55 .2)))) ((> ratio 1.1) 60))) (match paramlst ((r1 r2 s1 s2) (debug:print 3 *default-log-port* "Using params r1=" r1 " r2=" r2 " s1=" s1 " s2=" s2) (min (max (/ (expt r1 (* r2 s2 ratio)) s1) 0) 30)) (else (debug:print 0 *default-log-port* "BAD exp-params, should be \"r1 r2 s1 s2\" but got " paramstr) 30))))) ;;====================================================================== ;; DO NOT CALL THIS DIRECTLY. It is called from common:wait-for-normalized-load ;; count - count down to zero, at some point we'd give up if the load never drops ;; num-tries - count down to zero number tries to get numcpus ;; (define (common:wait-for-cpuload maxnormload numcpus-in |
︙ | ︙ | |||
1095 1096 1097 1098 1099 1100 1101 | ", normalized effective load: " normalized-effective-load )) ;; overloaded and count expired (i.e. went to zero) (else (if (> num-tries 0) ;; should be "num-tries-left". (if (common:low-noise-print 30 (conc (round effective-load) "-load-acceptable-" effective-host)) (debug:print 0 *default-log-port* "Load on " effective-host " is acceptable at effective normalized load of " | | | 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 | ", normalized effective load: " normalized-effective-load )) ;; overloaded and count expired (i.e. went to zero) (else (if (> num-tries 0) ;; should be "num-tries-left". (if (common:low-noise-print 30 (conc (round effective-load) "-load-acceptable-" effective-host)) (debug:print 0 *default-log-port* "Load on " effective-host " is acceptable at effective normalized load of " normalized-effective-load " continuing.")) (debug:print 0 *default-log-port* "Load on " effective-host ", " first" could not be retrieved. Giving up and continuing.")))))) ;;====================================================================== ;; DO NOT CALL THIS DIRECTLY. It is called from common:wait-for-normalized-load ;; ;; (define (common:wait-for-cpuload maxload-in numcpus-in waitdelay #!key (count 1000) (msg #f)(remote-host #f)(force-maxload #f)(num-tries 5)) |
︙ | ︙ |
Modified tasksmod.scm from [afe09c0715] to [ba207adbea].
︙ | ︙ | |||
1106 1107 1108 1109 1110 1111 1112 | (if (tasks:set-area dbh configdat) (tasks:sync-to-postgres configdat dest) (begin (debug:print 0 *default-log-port* "ERROR: unable to create an area record") #f))))) | > > > > > > > > > > > > | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 | (if (tasks:set-area dbh configdat) (tasks:sync-to-postgres configdat dest) (begin (debug:print 0 *default-log-port* "ERROR: unable to create an area record") #f))))) ;;====================================================================== ;; see defstruct host at top of file. ;; host: reachable last-update last-used last-cpuload ;; (define (common:update-host-loads-table hosts-raw) (let* ((hosts (filter (lambda (x) (string-match (regexp "^\\S+$") x)) hosts-raw))) (for-each (lambda (hostname) (let* ((rec (let ((h (hash-table-ref/default *host-loads* hostname #f))) (if h h (let ((h (make-host))) (hash-table-set! *host-loads* hostname h) h)))) (host-info (common:get-host-info hostname)) (is-reachable (car host-info)) (last-reached-time (cadr host-info)) (load (caddr host-info))) (host-reachable-set! rec is-reachable) (host-last-update-set! rec last-reached-time) (host-last-cpuload-set! rec load))) hosts))) ;;====================================================================== ;; ideally put all this info into the db, no need to preserve it across moving homehost ;; ;; return list of ;; ( reachable? cpuload update-time ) (define (common:get-host-info hostname) (let* ((loadinfo (rmt:get-latest-host-load hostname)) ;; if this host happens to have been recently used by a test reuse the load data (load (car loadinfo)) (load-sample-time (cdr loadinfo)) (load-sample-age (- (current-seconds) load-sample-time)) (loadinfo-timeout-seconds 6) ;; this was 20 seconds, seems way too lax. Switch to 6 seconds (host-last-update-timeout-seconds 4) (host-rec (hash-table-ref/default *host-loads* hostname #f)) ) (cond ((< load-sample-age loadinfo-timeout-seconds) (list #t load-sample-time load)) ((and host-rec (< (current-seconds) (+ (host-last-update host-rec) host-last-update-timeout-seconds))) (list #t (host-last-update host-rec) (host-last-cpuload host-rec ))) ((common:unix-ping hostname) (list #t (current-seconds) (alist-ref 'adj-core-load (common:get-normalized-cpu-load hostname)))) ;; this is cheaper than you might think. get-normalized-cpu-load is cached for up to 5 seconds (else (list #f 0 -1) ;; bad host, don't use! )))) ;;====================================================================== ;; Force a megatest cleanup-db if version is changed and skip-version-check not specified ;; Do NOT check if not on homehost! ;; (define (common:exit-on-version-changed) (if (and *toppath* ;; do nothing if *toppath* not yet provided (rmt:on-homehost?)) (if (common:api-changed?) (let* ((mtconf (conc (get-environment-variable "MT_RUN_AREA_HOME") "/megatest.config")) (dbfile (conc (get-environment-variable "MT_RUN_AREA_HOME") ".mtdb/main.db")) (read-only (not (file-write-access? dbfile))) (dbstruct (db:setup))) ;; (db:setup-db *dbstruct-dbs* *toppath* #f))) ;; #t))) (debug:print 0 *default-log-port* "WARNING: Version mismatch!\n" " expected: " (common:version-signature) "\n" " got: " (common:get-last-run-version)) (cond ((get-environment-variable "MT_SKIP_DB_MIGRATE") #t) ((and (common:file-exists? mtconf) (common:file-exists? dbfile) (not read-only) (eq? (current-user-id)(file-owner mtconf))) ;; safe to run -cleanup-db (debug:print 0 *default-log-port* " I see you are the owner of megatest.config, attempting to cleanup and reset to new version") (handle-exceptions exn (begin (debug:print 0 *default-log-port* "Failed to switch versions. exn=" exn) (debug:print 0 *default-log-port* " message: " ((condition-property-accessor 'exn 'message) exn)) (print-call-chain (current-error-port)) (exit 1)) (common:cleanup-db dbstruct))) ((not (common:file-exists? mtconf)) (debug:print 0 *default-log-port* " megatest.config does not exist in this area. Cannot proceed with megatest version migration.") (exit 1)) ((not (common:file-exists? dbfile)) (debug:print 0 *default-log-port* " .mtdb/main.db does not exist in this area. Cannot proceed with megatest version migration.") (exit 1)) ((not (eq? (current-user-id)(file-owner mtconf))) (debug:print 0 *default-log-port* " You do not own .mtdb/main.db in this area. Cannot proceed with megatest version migration.") (exit 1)) (read-only (debug:print 0 *default-log-port* " You have read-only access to this area. Cannot proceed with megatest version migration.") (exit 1)) (else (debug:print 0 *default-log-port* " to switch versions you can run: \"megatest -cleanup-db\"") (exit 1))))))) ;;====================================================================== ;; (begin ;; (debug:print 0 *default-log-port* "ERROR: cannot migrate version unless on homehost. Exiting.") ;; (exit 1)))) (define (common:wait-for-homehost-load maxnormload msg) (let loop ((start-time (current-seconds))) ;; we saw some instances of this being called before *toppath* was set. This might be an early setup race. This delay should help but it is impossible to test... (if (not *toppath*) (begin (debug:print 0 *default-log-port* "ERROR: common:wait-for-homehost-load called before *toppath* set.") (thread-sleep! 30) (if (< (- (current-seconds) start-time) 300) (loop start-time))))) (case (rmt:transport-mode) ((http) (let* ((hh-dat (if (rmt:on-homehost?) ;; if we are on the homehost then pass in #f so the calls are local. #f (server:choose-server *toppath* 'homehost))) (hh (if hh-dat (car hh-dat) #f))) (common:wait-for-normalized-load maxnormload msg hh))) (else (common:wait-for-normalized-load maxnormload msg (get-host-name))))) (define (configf:write-alist cdat fname) (if (not (common:faux-lock fname)) (debug:print 0 *default-log-port* "INFO: Could not get lock on " fname)) (let* ((dat (configf:config->alist cdat)) (res (begin (with-output-to-file fname ;; first write out the file (lambda () (pp dat))) (if (common:file-exists? fname) ;; now verify it is readable (if (configf:read-alist fname) #t ;; data is good. (begin (handle-exceptions exn (begin (debug:print 0 *default-log-port* "deleting " fname " failed, exn=" exn) #f) (debug:print 0 *default-log-port* "WARNING: content " dat " for cache " fname " is not readable. Deleting generated file.") (delete-file fname)) #f)) #f)))) (common:faux-unlock fname) res)) ;;====================================================================== ;; faux-lock is deprecated. Please use simple-lock below ;; (define (common:faux-lock keyname #!key (wait-time 8)(allow-lock-steal #t)) (if (rmt:no-sync-get/default keyname #f) ;; do not be tempted to compare to pid. locking is a one-shot action, if already locked for this pid it doesn't actually count (if (> wait-time 0) (begin (thread-sleep! 1) (if (eq? wait-time 1) ;; only one second left, steal the lock (begin (debug:print-info 0 *default-log-port* "stealing lock for " keyname) (common:faux-unlock keyname force: #t))) (common:faux-lock keyname wait-time: (- wait-time 1))) #f) (begin (rmt:no-sync-set keyname (conc (current-process-id))) (equal? (conc (current-process-id)) (conc (rmt:no-sync-get/default keyname #f)))))) (define (common:faux-unlock keyname #!key (force #f)) (if (or force (equal? (conc (current-process-id)) (conc (rmt:no-sync-get/default keyname #f)))) (begin (if (rmt:no-sync-get/default keyname #f) (rmt:no-sync-del! keyname)) #t) #f)) ;;====================================================================== ;; simple lock. improve and converge on this one. ;; (define (common:simple-lock keyname) (rmt:no-sync-get-lock keyname)) (define (common:simple-unlock keyname #!key (force #f)) (rmt:no-sync-del! keyname)) ;; returns waitons waitors tconfigdat ;; (define (tests:get-waitons test-name all-tests-registry global-waitons) (let* ((config (tests:get-testconfig test-name #f all-tests-registry 'return-procs))) ;; assuming no problems with immediate evaluation, this could be simplified ('return-procs -> #t) (let ((instr (if config (configf:lookup config "requirements" "waiton") (begin ;; No config means this is a non-existant test (debug:print-error 0 *default-log-port* "non-existent required test \"" test-name "\"") (exit 1)))) (instr2 (if config (configf:lookup config "requirements" "waitor") ""))) (debug:print-info 8 *default-log-port* "waitons string is " instr ", waitors string is " instr2) (let* ((newwaitons-tmp (string-split (cond ((procedure? instr) ;; here (let ((res (instr))) (debug:print-info 8 *default-log-port* "waiton procedure results in string " res " for test " test-name) res)) ((string? instr) instr) (else ;; NOTE: This is actually the case of *no* waitons! ;; (debug:print-error 0 *default-log-port* "something went wrong in processing waitons for test " test-name) "")))) (newwaitors (string-split (cond ((procedure? instr2) (let ((res (instr2))) (debug:print-info 8 *default-log-port* "waitor procedure results in string " res " for test " test-name) res)) ((string? instr2) instr2) (else ;; NOTE: This is actually the case of *no* waitons! ;; (debug:print-error 0 *default-log-port* "something went wrong in processing waitons for test " test-name) "")))) (newwaitons (if (and (list? global-waitons) (not (null? global-waitons))) (begin (debug:print 0 *default-log-port* "Adding global waitons " global-waitons) (append newwaitons-tmp (filter (lambda (x) ;; remove self from global waitons (not (equal? x test-name))) global-waitons))) newwaitons-tmp))) (values ;; the waitons (filter (lambda (x) (if (hash-table-ref/default all-tests-registry x #f) #t (begin (debug:print-error 0 *default-log-port* "test " test-name " has unrecognised waiton testname " x) #f))) newwaitons) (filter (lambda (x) (if (hash-table-ref/default all-tests-registry x #f) #t (begin (debug:print-error 0 *default-log-port* "test " test-name " has unrecognised waiton testname " x) #f))) newwaitors) config))))) ;; Check for waiver eligibility ;; (define (tests:check-waiver-eligibility testdat prev-testdat) (let* ((test-registry (make-hash-table)) (testconfig (tests:get-testconfig (db:test-get-testname testdat) (db:test-get-item-path testdat) test-registry #f)) (test-rundir ;; (sdb:qry 'passstr (db:test-get-rundir testdat)) ;; ) (prev-rundir ;; (sdb:qry 'passstr (db:test-get-rundir prev-testdat)) ;; ) (waivers (if testconfig (configf:section-vars testconfig "waivers") '())) (waiver-rx (regexp "^(\\S+)\\s+(.*)$")) (diff-rule "diff %file1% %file2%") (logpro-rule "diff %file1% %file2% | logpro %waivername%.logpro %waivername%.html")) (if (not (common:file-exists? test-rundir)) (begin (debug:print-error 0 *default-log-port* "test run directory is gone, cannot propagate waiver") #f) (begin (push-directory test-rundir) (let ((result (if (null? waivers) #f (let loop ((hed (car waivers)) (tal (cdr waivers))) (debug:print 0 *default-log-port* "INFO: Applying waiver rule \"" hed "\"") (let* ((waiver (configf:lookup testconfig "waivers" hed)) (wparts (if waiver (string-match waiver-rx waiver) #f)) (waiver-rule (if wparts (cadr wparts) #f)) (waiver-glob (if wparts (caddr wparts) #f)) (logpro-file (if waiver (let ((fname (conc hed ".logpro"))) (if (common:file-exists? fname) fname (begin (debug:print 0 *default-log-port* "INFO: No logpro file " fname " falling back to diff") #f))) #f)) ;; if rule by name of waiver-rule is found in testconfig - use it ;; else if waivername.logpro exists use logpro-rule ;; else default to diff-rule (rule-string (let ((rule (configf:lookup testconfig "waiver_rules" waiver-rule))) (if rule rule (if logpro-file logpro-rule (begin (debug:print 0 *default-log-port* "INFO: No logpro file " logpro-file " found, using diff rule") diff-rule))))) ;; (string-substitute "%file1%" "foofoo.txt" "This is %file1% and so is this %file1%." #t) (processed-cmd (string-substitute "%file1%" (conc test-rundir "/" waiver-glob) (string-substitute "%file2%" (conc prev-rundir "/" waiver-glob) (string-substitute "%waivername%" hed rule-string #t) #t) #t)) (res #f)) (debug:print 0 *default-log-port* "INFO: waiver command is \"" processed-cmd "\"") (if (eq? (system processed-cmd) 0) (if (null? tal) #t (loop (car tal)(cdr tal))) #f)))))) (pop-directory) result))))) ;; if .testconfig exists in test directory read and return it ;; else if have cached copy in *testconfigs* return it IFF there is a section "have fulldata" ;; else read the testconfig file ;; if have path to test directory save the config as .testconfig and return it ;; (define (tests:get-testconfig test-name item-path test-registry system-allowed #!key (force-create #f)(allow-write-cache #t)(wait-a-minute #f)) (let* ((use-cache (common:use-cache?)) (cache-path (tests:get-test-path-from-environment)) (cache-file (and cache-path (conc cache-path "/.testconfig"))) (cache-exists (and cache-file (not force-create) ;; if force-create then pretend there is no cache to read (common:file-exists? cache-file))) (cached-dat (if (and (not force-create) cache-exists use-cache) (handle-exceptions exn (begin (debug:print 0 *default-log-port* "failed to read " cache-file ", exn=" exn) #f) ;; any issues, just give up with the cached version and re-read (configf:read-alist cache-file)) #f)) (test-full-name (if (and item-path (not (string-null? item-path))) (conc test-name "/" item-path) test-name))) (if cached-dat cached-dat (let ((dat (hash-table-ref/default *testconfigs* test-full-name #f))) (if (and dat ;; have a locally cached version (hash-table-ref/default dat "have fulldata" #f)) ;; marked as good data? dat ;; no cached data available (let* ((treg (or test-registry (tests:get-all))) (test-path (or (hash-table-ref/default treg test-name #f) (let* ((local-tcdir (conc (getenv "MT_LINKTREE") "/" (getenv "MT_TARGET") "/" (getenv "MT_RUNNAME") "/" test-name "/" item-path)) (local-tcfg (conc local-tcdir "/testconfig"))) (if (common:file-exists? local-tcfg) local-tcdir #f)) (conc *toppath* "/tests/" test-name))) (test-configf (conc test-path "/testconfig")) (testexists (let loopa ((tries-left 30)) (cond ( (and (common:file-exists? test-configf)(file-read-access? test-configf)) #t) ( (common:file-exists? test-configf) (debug:print 0 *default-log-port* "WARNING: Cannot read testconfig file: "test-configf) #f) ( (and wait-a-minute (> tries-left 0)) (thread-sleep! 10) (debug:print 0 *default-log-port* "WARNING: testconfig file does not exist: "test-configf" will retry in 10 seconds. Tries left: "tries-left) ;; BB: this fires (loopa (sub1 tries-left))) (else (debug:print 2 *default-log-port* "WARNING: testconfig file does not exist: "test-configf) ;; BB: this fires #f)))) (tcfg (if testexists (read-config test-configf #f system-allowed environ-patt: (if system-allowed "pre-launch-env-vars" #f)) #f))) (if (and tcfg cache-file) (hash-table-set! tcfg "have fulldata" #t)) ;; mark this as fully read data (if tcfg (hash-table-set! *testconfigs* test-full-name tcfg)) (if (and testexists cache-file (file-write-access? cache-path) allow-write-cache) (let ((tpath (conc cache-path "/.testconfig"))) (debug:print-info 1 *default-log-port* "Caching testconfig for " test-name " in " tpath) (if (and tcfg (not (common:in-running-test?))) (configf:write-alist tcfg tpath)))) tcfg)))))) ;;====================================================================== ;; go through the hosts from least recently used to most recently used, pick the first that meets the load criteral from the ;; [host-rules] section. ;; (define (common:get-least-loaded-host hosts-raw host-type configdat) (let* ((rdat (configf:lookup configdat "host-rules" host-type)) (rules (common:val->alist (or rdat "") convert: #t)) ;; maxnload, maxnjobs, maxjobrate (maxnload (common:alist-ref/default 'maxnload rules 1.5)) ;; max normalized load (maxnjobs (common:alist-ref/default 'maxnjobs rules 1.5)) ;; max normalized number of jobs (maxjobrate (common:alist-ref/default 'maxjobrate rules (/ 1 6))) ;; max rate of submitting jobs to a given host in jobs/second (hosts (filter (lambda (x) (string-match (regexp "^\\S+$") x)) hosts-raw)) ;; (best-host #f) (get-rec (lambda (hostname) ;; (print "get-rec hostname=" hostname) (let ((h (hash-table-ref/default *host-loads* hostname #f))) (if h h (let ((h (make-host))) (hash-table-set! *host-loads* hostname h) h))))) (best-load 99999) (curr-time (current-seconds)) (get-hosts-sorted (lambda (hosts) (sort hosts (lambda (a b) (let ((a-rec (get-rec a)) (b-rec (get-rec b))) ;; (print "a=" a " a-rec=" a-rec " host-last-used=" (host-last-used a-rec)) ;; (print "b=" b " b-rec=" b-rec " host-last-used=" (host-last-used b-rec)) (< (host-last-used a-rec) (host-last-used b-rec)))))))) (debug:print 0 *default-log-port* "INFO: hosts-sorted=" (get-hosts-sorted hosts)) (if (null? hosts) #f ;; no hosts to select from. All done and giving up now. (let ((hosts-sorted (get-hosts-sorted hosts))) (common:update-host-loads-table hosts) (let loop ((hostname (car hosts-sorted)) (tal (cdr hosts-sorted)) (best-host #f)) (let* ((rec (get-rec hostname)) (reachable (host-reachable rec)) (load (host-last-cpuload rec)) (last-used (host-last-used rec)) (delta (- curr-time last-used)) (job-rate (if (> delta 0) (/ 1 delta) 999)) ;; jobs per second (new-best (cond ((not reachable) (debug:print 0 *default-log-port* "Skipping host " hostname " as it cannot be reached.") best-host) ((and (< load maxnload) ;; load is acceptable (< job-rate maxjobrate)) ;; job rate is acceptable (set! best-load load) hostname) (else best-host)))) (debug:print 0 *default-log-port* "INFO: Trying host " hostname " with load " load ", last used " delta " seconds ago, with job-rate " job-rate " for running a test." ) (if new-best (begin ;; found a host, return it (debug:print 0 *default-log-port* "INFO: Found host: " new-best " load: " load " last-used: " delta " seconds ago, with job-rate: " job-rate) (host-last-used-set! rec curr-time) new-best) (if (null? tal) #f (loop (car tal)(cdr tal) best-host))))))))) ;;====================================================================== ;; T E S T L A U N C H I N G P E R I T E M W I T H H O S T T Y P E S ;;====================================================================== ;; ;; [hosts] ;; arm cubie01 cubie02 ;; x86_64 zeus xena myth01 ;; allhosts #{g hosts arm} #{g hosts x86_64} ;; ;; [host-types] ;; general #MTLOWESTLOAD #{g hosts allhosts} ;; arm #MTLOWESTLOAD #{g hosts arm} ;; nbgeneral nbjob run JOBCOMMAND -log $MT_LINKTREE/$MT_TARGET/$MT_RUNNAME.$MT_TESTNAME-$MT_ITEM_PATH.lgo ;; ;; [host-rules] ;; # maxnload => max normalized load ;; # maxnjobs => max jobs per cpu ;; # maxjobrate => max jobs per second ;; general maxnload=1.1; maxnjobs=1.2; maxjobrate=0.1 ;; ;; [launchers] ;; envsetup general ;; xor/%/n 4C16G ;; % nbgeneral ;; ;; [jobtools] ;; # if defined and not "no" flexi-launcher will bypass "launcher" unless no match. ;; flexi-launcher yes ;; launcher nbfake ;; (define (common:get-launcher configdat testname itempath) (let ((fallback-launcher (configf:lookup configdat "jobtools" "launcher"))) (if (and (configf:lookup configdat "jobtools" "flexi-launcher") ;; overrides launcher (not (equal? (configf:lookup configdat "jobtools" "flexi-launcher") "no"))) (let* ((launchers (hash-table-ref/default configdat "launchers" '()))) (if (null? launchers) fallback-launcher (let loop ((hed (car launchers)) (tal (cdr launchers))) (let ((patt (car hed)) (host-type (cadr hed))) (if (tests:match patt testname itempath) (begin (debug:print-info 2 *default-log-port* "Have flexi-launcher match for " testname "/" itempath " = " host-type) (let ((launcher (configf:lookup configdat "host-types" host-type))) (if launcher (let* ((launcher-parts (string-split launcher)) (launcher-exe (car launcher-parts))) (if (equal? launcher-exe "#MTLOWESTLOAD") ;; this is our special case, we will find the lowest load and craft a nbfake commandline (let host-loop ((targ-host (common:get-least-loaded-host (cdr launcher-parts) host-type configdat)) (count 100)) (if targ-host (conc "remrun " targ-host) (if (> count 0) (begin (debug:print 0 *default-log-port* "INFO: Waiting for a host for host-type " host-type) (thread-sleep! (- 101 count)) (host-loop (common:get-least-loaded-host (cdr launcher-parts) host-type configdat) (- count 1))) (begin (debug:print 0 *default-log-port* "FATAL: Failed to find a host from #MTLOWESTLOAD for host-type " host-type) (exit))))) launcher)) (begin (debug:print-info 0 *default-log-port* "WARNING: no launcher found for host-type " host-type) (if (null? tal) fallback-launcher (loop (car tal)(cdr tal))))))) ;; no match, try again (if (null? tal) fallback-launcher (loop (car tal)(cdr tal)))))))) fallback-launcher))) ;; Do not rpc this one, do the underlying calls!!! (define (tests:test-set-status! run-id test-id state status comment dat #!key (work-area #f)) (let* ((real-status status) (otherdat (if dat dat (make-hash-table))) (testdat (rmt:get-test-info-by-id run-id test-id)) (test-name (db:test-get-testname testdat)) (item-path (db:test-get-item-path testdat)) ;; before proceeding we must find out if the previous test (where all keys matched except runname) ;; was WAIVED if this test is FAIL ;; NOTES: ;; 1. Is the call to test:get-previous-run-record remotified? ;; 2. Add test for testconfig waiver propagation control here ;; (prev-test (if (equal? status "FAIL") (rmt:get-previous-test-run-record run-id test-name item-path) #f)) (waived (if prev-test (if prev-test ;; true if we found a previous test in this run series (let ((prev-status (db:test-get-status prev-test)) (prev-state (db:test-get-state prev-test)) (prev-comment (db:test-get-comment prev-test))) (debug:print 4 *default-log-port* "prev-status " prev-status ", prev-state " prev-state ", prev-comment " prev-comment) (if (and (equal? prev-state "COMPLETED") (equal? prev-status "WAIVED")) (if comment comment prev-comment) ;; waived is either the comment or #f #f)) #f) #f))) (if (and waived (tests:check-waiver-eligibility testdat prev-test)) (set! real-status "WAIVED")) (debug:print 4 *default-log-port* "real-status " real-status ", waived " waived ", status " status) ;; update the primary record IF state AND status are defined (if (and state status) (begin (rmt:set-state-status-and-roll-up-items run-id test-id item-path state real-status (if waived waived comment)) ;; (mt:process-triggers run-id test-id state real-status) ;; triggers are called in test-set-state-status )) ;; if status is "AUTO" then call rollup (note, this one modifies data in test ;; run area, it does remote calls under the hood. ;; (if (and test-id state status (equal? status "AUTO")) ;; (rmt:test-data-rollup run-id test-id status)) ;; add metadata (need to do this way to avoid SQL injection issues) ;; :first_err ;; (let ((val (hash-table-ref/default otherdat ":first_err" #f))) ;; (if val ;; (sqlite3:execute db "UPDATE tests SET first_err=? WHERE run_id=? AND testname=? AND item_path=?;" val run-id test-name item-path))) ;; ;; ;; :first_warn ;; (let ((val (hash-table-ref/default otherdat ":first_warn" #f))) ;; (if val ;; (sqlite3:execute db "UPDATE tests SET first_warn=? WHERE run_id=? AND testname=? AND item_path=?;" val run-id test-name item-path))) (let ((category (hash-table-ref/default otherdat ":category" "")) (variable (hash-table-ref/default otherdat ":variable" "")) (value (hash-table-ref/default otherdat ":value" #f)) (expected (hash-table-ref/default otherdat ":expected" "n/a")) (tol (hash-table-ref/default otherdat ":tol" "n/a")) (units (hash-table-ref/default otherdat ":units" "")) (type (hash-table-ref/default otherdat ":type" "")) (dcomment (hash-table-ref/default otherdat ":comment" ""))) (debug:print 4 *default-log-port* "category: " category ", variable: " variable ", value: " value ", expected: " expected ", tol: " tol ", units: " units) (if (and value) ;; require only value; BB was- all three required (let ((dat (conc category "," variable "," value "," expected "," tol "," units "," dcomment ",," ;; extra comma for status type ))) ;; This was run remote, don't think that makes sense. Perhaps not, but that is the easiest path for the moment. (rmt:csv->test-data run-id test-id dat) ;; This was added in check-in a5adfa3f9a. Message was: "...added delay in set-values to allow for delayed write on server start" ;; I'm inserting an arbitrary rmt: call to force/ensure that the server is available to (hopefully) prevent a communication issue. (rmt:get-var "MEGATEST_VERSION") ;; this does NOTHING but ensure the server is reachable. This is almost certainly NOT needed :) ;; BB - commentiong out arbitrary 10 second wait (thread-sleep! 10) ;; add 10 second delay before quit incase rmt needs time to start a server. ))) ;; need to update the top test record if PASS or FAIL and this is a subtest ;;;;;; (if (not (equal? item-path "")) ;;;;;; (rmt:set-state-status-and-roll-up-items run-id test-name item-path state status #f) ;;;;;) (if (or (and (string? comment) (string-match (regexp "\\S+") comment)) waived) (let ((cmt (if waived waived comment))) (rmt:general-call 'set-test-comment run-id cmt test-id))))) ) |
Modified testsmod.scm from [b98dc11a3f] to [e40120c627].
︙ | ︙ | |||
105 106 107 108 109 110 111 | debugprint ))) (import directory-utils debugprint | | | | | | 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | debugprint ))) (import directory-utils debugprint ;; commonmod ;; configfmod ;; dbmod ;; dbfile ;; megatestmod ) ) |
Added utils/run-plot.sh version [7b2ea3fedf].
> > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 | #!/bin/bash IGNORE_UNITS=portlogger,stml2,debugprint,mtargs,ods FILES=$(ls *mod.scm|grep -v import) if [[ utils/plot-uses.scm -nt utils/plot-uses ]];then oldcsc csc utils/plot-uses.scm fi ./utils/plot-uses todot $IGNORE_UNITS $FILES > unitdeps.dot dot unitdeps.dot -Tpdf -o unitdeps.pdf |