IRC logs for #openttd on OFTC at 2024-04-16
โด go to previous day
00:08:36 *** HerzogDeXtEr has quit IRC (Read error: Connection reset by peer)
00:19:01 *** Wormnest has quit IRC (Ping timeout: 480 seconds)
00:20:09 *** Wormnest has joined #openttd
00:34:20 <_glx_> no newgrf involved, that's probably better
00:50:53 *** asyncmeow has joined #openttd
00:50:53 <asyncmeow> hey! i'm wondering if it would be possible for me to make a tool to connect to a server as a spectator and export the data on each company to a prometheus metrics endpoint. what would be the bare minimum i'd need to reimplement protocol wise to get, say, just the data for the finances window for each company on a server?
00:54:47 *** efessel has joined #openttd
00:54:47 <efessel> asyncmeow: couldn't you just connect to the admin port and periodically query company finances?
00:57:16 <asyncmeow> how detailed is the info that returns? i'm eventually going to want info on the level of individual stations for example, or production/% transported values for individual industries
00:58:04 <efessel> Not that detailed unfortunately - IIRC just company balance and # of each type of vehicle
01:04:24 <asyncmeow> so i'm guessing reimplementing the MP protocol would be what i want ๐
01:05:28 *** gnu_jj_ has quit IRC (Ping timeout: 480 seconds)
01:05:37 <peter1138> The protocol starts with sending a full compressed (xz, hah) savegame to the client.
01:09:25 <asyncmeow> so i'd have to implemement uh. savegame parsing.
01:09:34 <asyncmeow> that sounds *fun* (... okay not really)
01:50:10 *** Wormnest has quit IRC (Quit: Leaving)
02:37:59 *** gnu_jj_ has joined #openttd
02:41:32 *** gnu_jj has quit IRC (Ping timeout: 480 seconds)
02:43:54 *** godbed_ has joined #openttd
02:47:16 *** debdog has quit IRC (Ping timeout: 480 seconds)
02:47:16 *** D-HUND has quit IRC (Ping timeout: 480 seconds)
04:10:55 <locosage> Protocol and savegame will only get you one-time data, if you want to update it continuously you'll need the whole game engine
04:12:53 <locosage> if you control the server it's simpler to just patch admin port to send what you want or write a game script
04:41:04 <DorpsGek> - Update: Translations from eints (by translators)
04:51:17 <asyncmeow> locosage: tbh i was thinking just connect and fetch the data once a minute, but also... both routes seem difficult to me anyway lmao, especially since most of my dev experience is in c#, and i don't really know cpp very well
04:51:50 <asyncmeow> having openttd company metrics in prometheus/grafana seems like a neat idea but not neat enough to figure all that out tbh
05:02:20 *** misterbungus has quit IRC (Quit: Logging off...)
05:13:01 <pickpacket> 14.0 IS THE MOST BEAUTIFUL GAME I'VE EVER SEEN!!!
05:13:22 <pickpacket> I mean, for real! The UI improvements are AWESOME!
05:13:28 <rutoks> asyncmeow: I wrote a gamescript that just logs everything that I want and parse stdout of the server
05:13:51 <pickpacket> I am constantly impressed by the pace of development for this game. It's amazing. You guys all rock
06:05:58 <truebrain> Second time was me ๐
06:08:42 <silent_tempest> Why is this in the sprites for Transport Tycoon?
06:25:05 <truebrain> okay, so uploading the Windows symbols failed for this nightly. Not sure why, but they did. And I can't retry, as I correctly protected the endpoint against overwriting files that already exist ๐
06:32:52 *** reldred has joined #openttd
06:32:52 <reldred> silent_tempest: Ask Chris Sawyer
06:33:04 <silent_tempest> Lol that guy still around?
06:33:16 <reldred> It predates OpenTTD by at least a decade and a half
06:33:19 *** belajalilija has joined #openttd
06:33:29 <reldred> Making mobile games or something
06:34:37 <belajalilija> Maybe she was family
06:34:45 <belajalilija> Maybe sheโs katie
06:35:18 <belajalilija> There was a park in rct called like katies world or something
06:38:26 <belajalilija> I put references to people in things i make
06:39:19 <belajalilija> In my grf im making fictional liveries for some people in here
06:40:57 <belajalilija> In cities skylines i name streets after friends, i name minecraft horses after singers and when i was little id name TTDLX towns after pets Iโve owned or taken care of
06:41:34 <locosage> not sure how to fix it though
07:11:44 <peter1138> Just turn it into regular language.
07:17:09 *** godbed is now known as debdog
07:43:58 <locosage> peter1138: go for it ;)
07:44:06 <locosage> I'm not doing anything with that code anyway
07:47:01 <andythenorth> articulated ships perhaps?
08:14:15 <kamnet> silent_tempest: This is Chris Sawyer in his early 80s hair metal phase. ๐
08:37:37 <peter1138> Hmm, how to abort worldgen if the NewGRF config won't allow towns to be built...
08:51:39 <locosage> having trouble clicking abort with zzzz? xD
08:55:16 <johnfranklin> Should be, sometimes the cursor is without response
08:58:23 <peter1138> It's threaded but still locked during parts of worldgen.
08:58:49 <peter1138> But I actually meant automatically abort early, instead of even trying.
08:58:59 <truebrain> peter1138: the abort code already happens when towns can't be build; so something like that? ๐
08:59:26 <peter1138> That happens once the map is generated and it's tried to build towns.
08:59:36 <peter1138> I want to abort before the map is generated at all.
08:59:46 <truebrain> the abort code should be the same?
09:00:04 <truebrain> as it is all worldgen, from that thread's perspective
09:01:34 <truebrain> `HandleGeneratingWorldAbortion`
09:01:41 <truebrain> bit weird name, but that should abort the generation
09:02:16 <truebrain> just make sure to do it inside the `try` in `_GenerateWorld`
09:08:54 <peter1138> > ci/woodpecker/pr/woodpecker Pipeline is running
10:40:25 <orudge> "Microsoft Corporation Shite Paper"
10:40:34 <orudge> I assume that's an OCR error...
10:41:45 <truebrain> depends on what is in it ๐
10:42:33 <orudge> ""Microsoft Authenticode Technology: Ensuring Accountability and Authenticity for Software Components on the Internet," Microsoft Corporation Shite Paper, Oct. 1996, 14 pages."
10:44:28 <truebrain> `***Please Note: This game is completely free and has an unlimited time trial. Any payments made will be used to keep the package up to date in the Windows Store. Source code is freely available upon request.***`
10:44:36 <truebrain> OpenTTD 1.7.2 on the MSStore
10:44:44 *** benszasty has joined #openttd
10:44:49 <truebrain> so many lies and wrong statements in that one sentence ๐
10:47:00 <orudge> truebrain: yes, I seem to recall we'd need to trademark OpenTTD to have Microsoft consider removing that, frustratingly.
10:47:36 <truebrain> what frustrates me more, is that it shows up first when searching for OpenTTD
10:47:45 <truebrain> while ours is much more recently updated
10:47:55 *** ahyangyi has joined #openttd
10:47:55 <ahyangyi> orudge: I still can't guess what was the "right" word instead of `shite`
10:48:25 <truebrain> ahyangyi: no worries, I didn't figure that one out either ๐
11:18:48 <peter1138> "Irregular" and "some or all"
11:19:04 *** bertvvvs has joined #openttd
11:19:05 <peter1138> (But no indication if that's everyone online or not)
11:27:37 <peter1138> Ideally we could do this before allowing the Generate button to be pressed, but it needs the NewGRFs to be loaded to know.
11:43:51 *** __karma has joined #openttd
11:43:51 <__karma> Hey, does this mean that I have to rebase? or is it another issue?
11:50:52 <efessel> peter1138: I know for a fact it's not always all but I cannot say for certain if sometimes it's all.
11:50:52 <efessel> I'll be watching the server closely today to gather more detail.
11:51:02 <efessel> (assuming you're referring to my issue)
11:59:30 <truebrain> efessel: but a savegame around the time it happens would be most valuable; just to see if it happens without user interaction ๐
12:00:12 <efessel> Working on it. Game resets in 1.5 hours, I'll have desync=3 on from the start.
12:02:46 <efessel> A player has tested their tree removal script without ill effects
12:06:58 <peter1138> We should probably fix the cache check at some point too.
12:54:08 <_glx_> hmm 4 users desync twice in 3 minutes window, would be interresting to see the commands between these 2 timestamps
12:54:24 <truebrain> the question for me is more: was it a command?
12:54:29 <truebrain> or just depending on when you joined? ๐
12:54:59 <truebrain> but that is why I would like a savegame, so we can run it for a while and see if that desyncs clients that join later
12:55:14 <truebrain> I want a savegame! ๐
12:55:24 <efessel> I have a bunch of autosaves
12:55:29 <efessel> not sure if that'll help
12:55:32 <truebrain> zip them, attach them to the ticket
12:55:53 <truebrain> there are three types of desyncs .. those caused by player actions, those caused by NewGRFs, and those caused by the passing of time
12:56:02 <truebrain> the second is ruled out, as, no NewGRFs ๐
12:56:22 <truebrain> the latter could for example be that when you join, you get another cache or what-ever (for example with ships, or unbunching, or <insert new feature here>)
12:56:29 <truebrain> it can take a while for that to show, ofc
12:57:08 <truebrain> so any savegame that was just before or after a desync helps there, as we can just run that for a long time and see what it does ๐
12:57:59 <_glx_> the ones around [2024-04-16 01:46:44] could be interresting
12:59:45 <LordAro> truebrain: don't rule out old features :p
12:59:55 <truebrain> just less likely, but sure
13:00:10 <truebrain> I mean, we are desync free for how many years now? ๐
13:00:17 <truebrain> (with vanilla non-NewGRF games)
13:00:58 <_glx_> if only we had betas or RCs to detect these before final release ๐
13:01:13 <truebrain> that is easy to say, but doesn't really mean anything
13:01:21 <truebrain> as in, it might be a player that builds in a certain way
13:01:51 <_glx_> would be nice to have a nightly based server
13:02:12 <truebrain> hold a weekly "test our nightly" game ๐
13:05:38 <peter1138> Comparing cargo caches does not currently work properlky.
13:07:07 <peter1138> memcpy and memcmp of a std::map is not going to go well.
13:08:39 <peter1138> And CargoList::~CargoList() calls delete on its pointers, so you can't just make a copy of it and compare the two in a C++ way.
13:10:51 <peter1138> Maybe we only need to compare count and cargo_periods_in_transit, as those are the things that InvalidateCache() touch.
13:12:39 <truebrain> JGRPP has a bunch more validation routines for all kind of things
13:12:42 <truebrain> might also be worth exploring
13:13:38 <truebrain> efessel: would you be able to give us the times of those autosaves, so we can correlate them? In the zip-file they all got the same timestamp
13:16:36 <efessel> Doh - edited the comment to dump timestamps
13:18:55 <_glx_> and I edited your comment to reorder in human order ๐
13:21:10 <truebrain> owh and efessel , when you have the desync=3 logs, they are ofc also more than welcome ๐ More information is better ๐
13:21:12 <peter1138> MILLISECONDS_PER_TICK=1
13:21:40 <truebrain> or just remove the small line of code preventing you from pressing FF ๐
13:22:02 <peter1138> Press the button your Apple's touchbar...
13:22:07 <truebrain> your idea is possibly better ๐
13:22:34 <_glx_> in theory FF works, it's just require a server a lot slower than any client ๐
13:22:52 <_glx_> else none will be able to catch up
13:43:33 <peter1138> Oh, my auto <=> operators don't work :S
14:23:02 <peter1138> Oh, they do work, I just forgot that Debug(desync, ...) is special...
14:34:00 <truebrain> _glx_: game-speed is not sync'd over network, so "working" is a stretch ๐
14:35:01 <_glx_> so clients should be able to run at the same speed as server
14:36:48 *** bertvvvs has quit IRC (Quit: You have a nice day.)
14:38:47 <truebrain> kinda, a bit, but not really; clients still try to sleep once in a while
14:38:55 <truebrain> or they timeout real quick
14:41:10 <locosage> with cmclient it's already possible to set gamespeed in mp and I'm yet to notice any issues with it
14:41:13 <locosage> other than obvious ones
14:42:27 <truebrain> lolz, I just crashed a server by timing out loading the map ๐
14:42:34 <truebrain> which isn't a surprise, as the server is being REAL QUICK
14:42:56 <truebrain> but the SendMap seems to lack proper protection to see that the client is no longer there ๐
14:47:27 <truebrain> I really need to rewrite this lag protection ๐
14:49:10 <truebrain> _glx_: for your information, in reality is becomes a slide-slow, when you fiddle with the game-speed ๐
14:49:18 <truebrain> (the client runs it in a busy-loop)
14:51:52 <truebrain> frame-lag of 2100+, lolz
14:55:54 <truebrain> even with `-vnull` I can't get the client to keep up ๐
14:58:54 <truebrain> at least with `-vnull` the client doesn't notice this, and keeps on going ๐ With SDL it goes like: he buddy, something is really odd, I am going to disconnect now ๐
14:59:53 <_glx_> I guess with sdl at some point the OS goes "this program is not reacting"
15:00:17 <truebrain> no, we have game code that picks up on us being REALLY slow compared to the server ๐
15:00:44 <truebrain> is there a console command to see the current in-game date?
15:01:07 *** Wormnest has joined #openttd
15:20:45 <truebrain> seems I can push it to about 300 frames/s on this map. Not great, but also not all that bad
15:25:34 <truebrain> all trains are "lost" .. but they are all going from A to B ๐
15:25:43 <truebrain> not sure what is going on there ๐
15:34:26 <truebrain> by just running the savegame, and joining clients at different moments in time, nothing seems to happen
15:43:38 <truebrain> so yeah efessel , you should have those `dump_cmds` files if you used `-ddesync=3` .. would love to get my hands on them ๐
16:17:26 *** HerzogDeXtEr has joined #openttd
16:19:09 *** Flygon has quit IRC (Quit: A toaster's basically a soldering iron designed to toast bread)
16:37:41 *** bertvvvs has joined #openttd
16:55:18 <efessel> truebrain: No desync's since it's been on ๐ฆ
17:09:25 <peter1138> Schrodinger's desync
18:27:28 <LordAro> should #12474 be backported?
19:33:49 *** gelignite has joined #openttd
19:41:00 <efessel> truebrain: what do you need from me ๐
19:41:08 <kuhnovic> LordAro: @glx22 (via GitHub) yes I think so. I'll add the label
19:42:06 <_glx_> (yeah reply to bot are a pain)
19:42:27 <_glx_> efessel: logs, savegames and command dumps
19:42:51 <efessel> I have about 300 dmp_cmds files, just need latest few?
19:43:49 <Rubidium> we'd need them from just before the moment the last user that didn't desync joined
19:46:12 <_glx_> any [desync:2] in your log ?
19:46:43 <efessel> ```[2024-04-16 19:39:49] dbg: [net:1] 'OxO' reported an error and is closing its connection: desync error
19:46:43 <efessel> *** OxO has left the game (desync error)
19:46:43 <efessel> [2024-04-16 19:39:49] dbg: [net:3] [server] Client #92 closed connection```
19:46:44 <peter1138> Hope it's not some 4kx4k map ๐
19:46:49 <efessel> that's what they look like
19:48:46 <Rubidium> you won't find the [desync:2] in the normal logs, but rather the commands-out.log
19:49:45 <_glx_> ah yes desync debug level is special
19:50:03 <Rubidium> though I'd suggest start adding that and the first savegame, then we can look through the log and request more savegames if needed
19:50:14 <Rubidium> (that being commands-out.log)
19:52:05 <efessel> LMK if you need anything else
19:54:46 <_glx_> static bool stFindNearestSafeTile(const Train *v, TileIndex t1, Trackdir td, bool override_railtype)
19:55:09 <_glx_> [2024-04-16 19:49:20] cmd: 000af6ef; 03; 07; 00000008; 00000fad; 98141E0000050000000004050000 (CmdBuildSingleSignal)
19:55:09 <_glx_> [2024-04-16 19:49:21] cmd: 000af6ef; 1b; 07; 00000008; 00000fad; 9B1C1E0000050000000004050000 (CmdBuildSingleSignal)
19:55:09 <_glx_> [2024-04-16 19:49:21] cmd: 000af6ef; 26; 07; 00000008; 00000fad; 9B181E0000050000000004050000 (CmdBuildSingleSignal)
19:55:09 <_glx_> [2024-04-16 19:49:21] CACHE ERROR: FindSafeTile() = [T, F]
19:55:09 <_glx_> [2024-04-16 19:49:21] cmd: 000af6ef; 2b; 07; 00000008; 00000fad; 9B181E0000050000000004050000 (CmdBuildSingleSignal)
19:55:13 <truebrain> why couldn't that just be more like the rest .. lol
19:55:48 <truebrain> did we change anything there?
19:56:13 <truebrain> there are a lot of `CACHE ERROR` entries btw
19:56:26 <_glx_> I was just looking at the last one
19:56:45 <truebrain> some happeen with only CmdLevelLand nearby
19:56:51 <truebrain> so not sure it is actually related to the command ๐
19:58:02 <truebrain> only called when reversing a train, it seems
19:58:10 <truebrain> owh, reserve .. not reverse
19:58:37 <_glx_> yeah I think it's to find a safe location to stop, usually a signal
19:59:37 <truebrain> honestly, it happens so often, and also after the clients desyncs (like 3s after) .. I wonder if it is related to the desync
19:59:45 <truebrain> ideally we have another desync, to find correlation ๐
20:00:42 <_glx_> the check does FindNearestSafeTile with and without cache and compare results
20:01:19 <efessel> is it safe to clear out the old .sav files, I assume these are not needed?
20:01:55 <truebrain> that is an ambigious answer .. "is it safe? no" or "are not needed? no" ๐
20:02:19 <efessel> There's not a huge space concern so not a pressing issue
20:03:00 <truebrain> do you also happen to have the first savegame from when the server started?
20:03:01 <Rubidium> the user called "Lama" joined at 15:07 and got desynced at 19:39
20:03:05 <truebrain> docs say you can use that to replay the whole game ๐
20:03:20 <truebrain> (so the save from 13:59)
20:03:24 <_glx_> [2024-04-16 13:59:43] save: 000acfda; 00; dmp_cmds_d601a577_000acfda.sav
20:03:25 <efessel> yes, is that the `dmp_cmds_00000000_00000000.sav`?
20:03:33 <truebrain> yeah, I assume it is
20:03:42 <truebrain> `[2024-04-16 13:59:39] save: 00000000; 00; dmp_cmds_00000000_00000000.sav`
20:03:52 <_glx_> that's the main menu ๐
20:04:08 <truebrain> so `[2024-04-16 13:59:43] save: 000acfda; 00; dmp_cmds_d601a577_000acfda.sav`
20:04:58 <truebrain> mostly curious if this replaying still works and all ๐
20:07:14 <Rubidium> at 14:48 player223 joined and given the logs never left, so that would imply hasn't desynced. Meaning the divergence potentially started somewhere between 14:48 and 15:07
20:07:22 <efessel> Edited comment with the update
20:09:39 <truebrain> hmm .. fails to parse the commands log ..
20:10:41 <_glx_> hmm so start/stop of a vehicle triggered it ? (just reading the logs)
20:10:51 <truebrain> `new_map` is not in the list, lol
20:11:42 <Rubidium> you'd need to trim everything from newmap and before in any case
20:12:03 <_glx_> trim before "save" I think
20:12:21 <truebrain> docs didn't mention that, does it?
20:12:45 <truebrain> hmm ... can't tell if it is actually doing anything ๐
20:14:02 *** jamesrandal1463 has joined #openttd
20:14:22 <truebrain> ah, debug to `desync` doesn't show up on console
20:14:30 <truebrain> needed to change that to `misc` before I could see it doing anything
20:14:52 *** HerzogDeXtEr has quit IRC (Read error: Connection reset by peer)
20:15:17 <Rubidium> the docs sort-of mentions it: "Repeat the replay starting from incrementally later 'dmp_cmds_*.sav' while truncating the 'commands.log' at the beginning appropriately". Since you didn't start with the first one... though the documentation could use some improvements on that front
20:15:30 <truebrain> yeah, but that is much later ๐
20:16:43 <kuhnovic> I'm really hoping it's not the ship PF ๐
20:16:59 <truebrain> even if it is, that is okay
20:17:12 <Rubidium> what worries me for debugging is that it takes over 4 hours for Lama to desync
20:17:15 <truebrain> efessel: do you also have `dmp_cmds` from around 14:48 for us?
20:17:35 *** ChanServ sets mode: +v tokai
20:17:44 <truebrain> and maybe a few from around that time, for good measure
20:17:58 <truebrain> `[2024-04-16 22:17:42] dbg: [misc:0] Trying to parse: CACHE ERROR: FindSafeTile() = [T, F]`
20:18:02 <truebrain> fails on those lines ๐
20:18:39 <truebrain> Rubidium: we can replay at ~10x the speed of normal games, so that isn't all that bad, I guess
20:19:14 <truebrain> efessel: and one from around 15:07 .. well, as you see, more data is better ๐
20:19:17 <truebrain> sorry for all the asks ๐
20:19:38 <efessel> No prob, the least I can do - you guys do all the amazing work.
20:19:47 <truebrain> only if we can find this ๐
20:21:16 <truebrain> so if I understand the docs right, the replay from the start should result in no problems, as what the server has seen should be what we are seeing after playback?
20:21:35 <truebrain> but if we would replay from a later savegame, things should go a bit wonky?
20:24:25 <Rubidium> yes, and I suspect the first savegame to go wonky to be one from somewhere between 14:48 and 15:07
20:24:39 <truebrain> reason I already requested them, yes ๐ Good!
20:25:01 <truebrain> owh, I can replay even faster as I don't have clients connected ๐
20:29:24 <truebrain> funny how verbose CmdLevelLand is ๐
20:29:32 <truebrain> so many commmaaannndddssss
20:30:02 <efessel> I think there's a user who uses a script to clear trees, I'm guessing that's the command that you see?
20:30:18 <truebrain> No, CmdLevelLand, so terraforming
20:30:29 <truebrain> at least, I assume ๐
20:30:45 *** gelignite has quit IRC (Quit: Stay safe!)
20:30:59 <truebrain> Rubidium: did you also look what player joined last and still desync'd?
20:31:03 <truebrain> okay, full replay finished; that is doable, time-wise. And indeed, no issues found. Now .. let's start with ad7c0.
20:32:48 <Rubidium> truebrain: OxO at 19:23
20:33:12 <truebrain> efessel: guess what .. do you have a save from just before and after 19:23? ๐
20:34:12 <efessel> sure, will be taking off for family time after this one though, any others you need?
20:34:28 <truebrain> nah; that will be enough for now ๐
20:34:54 <_glx_> [2024-04-16 19:23:29] save: 000af460; 00; dmp_cmds_d601a577_000af460.sav
20:36:46 <truebrain> ad7c0 caused no desync on replay
20:39:40 <truebrain> and that is a relative short reply
20:39:51 <truebrain> (tip: make `GetGameInterval` return `0` ๐ )
20:40:02 <truebrain> `[2024-04-16 22:39:15] dbg: [misc:0] Sync check: 000af5f9; 00; mismatch expected {30e73fed, 13f06d00}, got {9f836808, a3dfa437}`
20:40:16 <truebrain> so 1 tick after CmdStartStopVehicle
20:42:02 <truebrain> only real change in that function is the reset of unbunching
20:42:37 <_glx_> but start stop is a cmdf
20:44:03 <truebrain> true; but the game was in sync a tick before that command
20:44:48 <truebrain> replaying failed command doesn't result in any additional reports from the desync debugger
20:45:09 <truebrain> if I replay the failed commands, it hits on the above
20:45:16 <truebrain> `[2024-04-16 22:44:28] dbg: [misc:0] Sync check: 000af5fe; 00; mismatch expected {665f5d6b, e891895b}, got {f162bc7a, 0c238ed8}`
20:45:56 <truebrain> (which is a few ticks later)
20:47:44 <truebrain> `af5e0` dmp_cmds did not trigger a desync. So I guess the actual problem happens slightly earlier?
20:49:23 <truebrain> efessel: sorry ... one more: everything between af480 and af5e0 would be much appreciated
20:49:34 <truebrain> the smaller we can get the time between replay and desync, the easier it gets ๐
20:50:45 <truebrain> on the other hand .. I guess the problem could have been introduced WAAAYYY earlier ๐
20:52:31 <_glx_> the idea is to get the exact command causing the issue
20:53:07 <truebrain> I would not consider that likely ๐ But it might be easier to compare savegames to see what is different
20:53:31 <truebrain> but I basically need "all savegames" from efess, to find the first savegame that fails to replay ๐
20:53:37 <truebrain> that might give an idea what is the direct cause
20:54:19 <truebrain> but it is like 100MB of saves per hour, and I need 4 hours of them ๐
20:54:45 <truebrain> but okay ... the docs say I should also start seeing those dmp_cmds.savs, but I do not
20:54:58 <Rubidium> in theory you can regenerate them if you rerun from the start and save
20:55:00 <truebrain> as I guess I could just recreate the saves locally
20:55:10 <truebrain> so why don't they show up for me?
20:56:12 <truebrain> because I am a peanut
20:57:24 <truebrain> right ... let's try this again, but now properly, shall we?
20:57:30 <truebrain> too bad it doesn't auto-skip entries from command-log
21:01:35 <truebrain> efessel: cancel request; no longer needed ๐
21:01:50 <truebrain> replaying is a lot slower with all these desync files being created ๐
21:02:09 *** nielsm has quit IRC (Remote host closed the connection)
21:02:11 <peter1138> But also more useful.
21:02:20 <truebrain> no longer having to ask efess for them, yes ๐
21:02:29 <efessel> that's right because here's the whole lot ๐
21:03:17 <efessel> Heading out, I'll be back in ~3 hours
21:10:29 <truebrain> okay, wrote a few lines that skip the commands.log till the frame we are at. Saves me from doing that ๐
21:11:01 <truebrain> now to replay a lot of files, to find the first savegame that causes the desync ๐
21:23:17 <truebrain> luckily for us I have a lot of cores to replay shit on ๐
21:24:11 <truebrain> still takes for-evah!
21:32:19 <truebrain> it is funny, it is a self-correcting desync ๐
21:32:49 <truebrain> for a long time, any client joining will get a desync at a certain moment, but all those joining after that certain moment, will not have this issue
21:35:44 <truebrain> updated my comment with the savegames that do and don't desync
21:38:52 <_glx_> and we will find it's a weird UB somewhere ๐
21:42:30 <truebrain> just a bit weird that some savegames desync, and the next doesn't, but the next after does again
21:43:46 <truebrain> enabled cmdf, let's see if that changes the result
21:44:46 <truebrain> no NewGRFs, sorry ๐
21:45:51 *** keikoz has quit IRC (Ping timeout: 480 seconds)
21:51:29 <truebrain> with cmdf enabled, a full replay desyncs ๐ฎ
21:53:46 <_glx_> hmm so a failed command affects server state
21:54:20 <truebrain> it might also be the result of the problem at hand
21:54:31 <truebrain> that the client thinks it can start/stop a vehicle, for example, and the server is like: no!
21:56:19 <_glx_> but client just runs in test mode before sending to server which will then send it back to client for execution
21:56:43 <truebrain> with replay, a cmdf can also become a cmd
21:57:23 <truebrain> so say, I want to start a vehicle. My client says: sure, state says you can. Send it to server, server goes: wuth? No! Records a cmdf
21:57:39 <truebrain> now we replay, and we get a cmdf (read as cmd), and sometimes we can execute it, and sometimes we can't
21:57:45 <truebrain> replicated that same behaviour
21:58:29 <truebrain> so from my understanding at least, it doesn't mean the testing influences the state, but more that it can be caused by some state difference that already exists
22:08:16 <_glx_> start/stop command flow is really simple, one of the only possible "failure" I can see is `Train::From(v)->gcache.cached_power`
22:09:47 <truebrain> hmm ... I am comparing savegames, but they have a much bigger difference than I expected
22:10:20 <truebrain> I start at af5c0, play till af5e0, and save. I would expect the dmp_cmds and this game to be identical
22:11:26 <truebrain> no, that I can see (I have the savegames as JSON)
22:11:33 <truebrain> owh, position of vehicles is not the same between saves
22:11:54 <_glx_> ah yes that too if not at the exact same moment
22:12:03 <truebrain> it is the EXACT same moment
22:14:45 <truebrain> might also just be the JSON ๐
22:18:17 <truebrain> according to one savegame, an effect is in slot 562, in the other it is in 2201. Otherwise they are identical
22:19:27 <truebrain> okay, when I look at the same savegame in my Rust decoder and my Python decoder, I get other IDs
22:20:15 <truebrain> ah, no, I was just looking at the wrong savegame in one of the two
22:20:47 <truebrain> the DATE chunk is identical
22:21:05 <truebrain> I would expect also effects to be in the same indexes, tbh
22:23:01 <_glx_> I would expect the same
22:24:47 <truebrain> double-checked my results, but the order of vehicles is actually different in the saves
22:25:10 <truebrain> which is ofc fine, if all references are updated too ๐
22:26:29 <truebrain> if I open and save the savegame, the results are identical
22:26:34 <truebrain> so it is not my tooling ... I tihnk ๐
22:26:49 <peter1138> Are the indices the same?
22:27:11 <truebrain> the indices are the order, not? /me double checks ...
22:27:52 <truebrain> owh, it is a sparse-table
22:27:55 <peter1138> But I wasn't sure what you meant with "order" considering you are also talking about JSON.
22:28:05 <truebrain> JSON outputs it as a dict
22:28:23 <truebrain> but we never validated my decoders on this level
22:28:29 <truebrain> so I am triple checking the hell out of this ๐
22:28:56 <peter1138> Anyway, if the order is different, and vehicles have a different index between client and server, that's a problem.
22:28:58 <truebrain> type 4 is sparse-table, which does read the index from the file
22:29:26 <_glx_> v->index is the thing to validate in the tool output ๐
22:29:36 <truebrain> which doesn't exist
22:29:41 *** bertvvvs has quit IRC (Ping timeout: 480 seconds)
22:29:41 <truebrain> that is the whole point / problem ๐
22:30:03 <peter1138> Yeah, pool index comes from the array order.
22:30:04 <truebrain> the index is stored not in the struct, but as part of the array/table itself
22:30:35 <truebrain> okay, I really do seem to read that correctly
22:30:56 <truebrain> but shouldn't that break tons of things like references?
22:32:14 <truebrain> we have REF_VEHICLE
22:32:17 <truebrain> which points to an index
22:32:26 <truebrain> so if when loading index is different ..
22:32:34 <truebrain> doesn't that link the weirdest things together?
22:33:03 <peter1138> There's no way it should change between save then load.
22:33:28 <peter1138> But what might happen is the index used by the client is different to the index used by the server?
22:33:57 <peter1138> But even in that case, both server and client saves would still be coherent with themselves, but of course not match either other.
22:34:27 <truebrain> so my diff should only be on new entries, you mean?
22:34:35 <_glx_> that could explain the start/stop issue
22:34:37 <peter1138> I don't really know what you are diffing ๐
22:34:48 <truebrain> I load game N, play it to N+1
22:34:53 <truebrain> and I have N+1 from the server
22:34:55 <truebrain> I dump them to JSON
22:35:03 <truebrain> and I see a big difference in the VEHS chunk
22:35:19 <truebrain> vehicles are "swapped" with each other
22:35:59 <peter1138> Assuming the JSON output is in the correct order.
22:36:12 <peter1138> But if it wasn't you could expect everything to be in an incorrect order.
22:36:32 <peter1138> Although, free pool bitmap is used by everything.
22:36:41 <truebrain> cargo packets etc also differ
22:36:45 <truebrain> it is not only vehicles
22:36:51 <truebrain> but vehicles show more clearly (lot of effects)
22:36:53 <_glx_> would mean FindFirstFree is not consistant
22:37:24 <_glx_> (or the underlying cache)
22:37:34 <peter1138> Might be possible to add debug output for what FindFirstFree returns.
22:38:34 <peter1138> But it must be triggered by something 'different' otherwise all network games without be desyncing often. And mine isn't.
22:38:41 <truebrain> okay, I loaded N-1, which gives the same N+1 as N
22:40:34 <truebrain> okay, N+1 also doesn't desync, where N and N-1 do
22:40:43 <truebrain> so the order seems "stable" until it isn't
22:40:58 <peter1138> You could even perhaps dump out the whole used_bitmap vector, although it's quite large, but could give a some clues .
22:41:58 <truebrain> but this would explain the flip/flop I see, where some older savegames do work, and some newer don't
22:42:28 <truebrain> also why with cmdf enabled it fails earlier, as in some cases the vehicle will be there, and in other cases it won't (under that index)
22:43:13 <truebrain> if this is true, I should be able to reply a much older game, and get the same savegame back .. let's try that
22:43:58 <truebrain> sadly, that takes for ever ofc, but okay
22:44:01 <peter1138> Any other pools showing discrepancies?
22:44:28 <truebrain> cargo.packets refs are different
22:44:32 <peter1138> If vehicles are different that could easily cause cargo packets to also be different.
22:44:40 <truebrain> but diffing a JSON is sucky, as it doesn't show context ๐
22:44:59 <peter1138> As the vehicles will be processed in a different order, so the cargo packets would be generated/deleted in a different order too.
22:45:00 <truebrain> okay, found the chunk, cargopackets are also flipped around
22:45:43 <truebrain> CITY as a result also has different values
22:47:13 <peter1138> Things like vehicle_cmd.cpp:204
22:48:19 <peter1138> A vehicle is bought, and then if it's not DC_EXEC it's sold again.
22:48:38 <peter1138> That will touch the used_bitmap.
22:49:27 <truebrain> oof, I had cmdf enabled, so now games that didn't fail earlier, do fail. This tricky bastard is constantly hiding ๐
22:50:11 <truebrain> let's disable cmdf again
22:52:38 <truebrain> comparing JSON with online-tools ... it hangs my browser ๐
22:54:55 <_glx_> maybe dump used_bitmap at the end of ResizeFor()
22:55:25 <truebrain> too late in the evening for me to do those kind of tests ๐
22:55:58 <truebrain> but if you replay `af5c0` to `af5e0`, you can see the savegames differ
22:56:02 <truebrain> it is a very short replay
22:56:16 <truebrain> there is a buy/sell in there
22:57:03 <peter1138> Which files do I need to download?
22:57:16 <truebrain> those two, and the console-cmds.log
22:57:23 <truebrain> (and follow the desync docs to get it running)
22:58:30 <truebrain> or you can give me a patch, and I can apply it
23:02:47 <truebrain> hmm .. this is odd ... I replay everything from the start now, and it desyncs
23:02:51 <truebrain> but it didn't do that earlier today
23:03:51 <truebrain> and they only change I made, is that I save the game now just before the desync happens
23:05:40 <peter1138> Didn't see af5c0 to af5e0 in the ticket, so downloaded the dump_cmds.gz, which is actually a tar.gz. Are extensions hard? :p
23:08:20 <peter1138> console-cmds.log is...
23:09:37 <truebrain> in "desync_artifacts.zip" if I remember correctly
23:10:48 <truebrain> okay, went over my changes to the source again, minimized them, running the desync test again from start .. 4 times at once. It really didn't desync earlier, so grr if it does now
23:11:26 <peter1138> So even with the log it's randoM ๐
23:11:39 <truebrain> that I am checking now
23:11:49 <truebrain> it just takes a long time to run this ๐
23:12:05 <truebrain> but when it does crash, the vehicle pool is identical to all my other savegames when it crashes
23:12:22 <truebrain> so the order is not random
23:12:38 <truebrain> it is also not deterministic enough, it seems ๐
23:12:58 <truebrain> but I am still not ruling out that it is a TrueBrain issue ๐
23:13:40 <truebrain> this is really not good
23:13:45 <truebrain> 2 out of the 4 runs desync'd
23:14:07 <truebrain> it seems to be in a superposition, and once observed, it takes on one of two states
23:14:36 <truebrain> and this 100% is not a TrueBrain issue ๐ As I started (copy/pasted) 4 identical desync tests ๐
23:19:38 <truebrain> okay ... how ever often I run 5e0, savegame is always the same. And how ever often I run 5c0, savegame is also always the same. Just not the same as 5e0. But when I run from the start, it takes on one of the two (but it does take on only one of those two)
23:19:58 *** bertvvvs has joined #openttd
23:21:26 <truebrain> I guess this would also explain the cmdf for CmdInsertOrder, which are a bit odd to see .. like, a unmodified client doesn't even allow you to do that
23:22:10 <truebrain> right, I need some sleep. Will look into this more tomorrow, if you haven't already found out what is going on ๐
23:25:46 <efessel> I wouldn't be surprised if players are using modified clients
23:29:51 <peter1138> 14.0 is only just out, seems unlikely ๐
23:34:40 <efessel> That's just a quick patch & compile ๐
23:41:55 *** tokai|noir has joined #openttd
23:41:55 *** ChanServ sets mode: +v tokai|noir
23:44:06 <_glx_> we had reports from other servers too
23:48:51 *** tokai has quit IRC (Ping timeout: 480 seconds)
23:56:07 *** Wormnest has quit IRC (Ping timeout: 480 seconds)
continue to next day โต