self.__VINEXT_RSC_CHUNKS__=self.__VINEXT_RSC_CHUNKS__||[];self.__VINEXT_RSC_CHUNKS__.push("2:I[\"aadde9aaef29\",[],\"default\",1]\n3:I[\"6e873226e03b\",[],\"Children\",1]\n5:I[\"bc2946a341c8\",[],\"LayoutSegmentProvider\",1]\n6:I[\"6e873226e03b\",[],\"Slot\",1]\n7:I[\"3506b3d116f7\",[],\"ErrorBoundary\",1]\n8:I[\"a9bbde40cf2d\",[],\"default\",1]\n9:I[\"3506b3d116f7\",[],\"NotFoundBoundary\",1]\na:\"$Sreact.suspense\"\n:HL[\"/assets/index-BLEkI_5r.css\",\"style\"]\n")v class="_meta_ka9gd_33">160 points by xena 1 days ago | 159 comments Rendered at 12:18:22 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
Am I missing something here? Yes, if you use a feature that intentionally inserts the build time and date into the code, the every build is going to be different. That's the whole point of these macros. It's a feature. If you don't want that behavior, don't use that feature.
It's meant to be a trivial counterexample. Like saying "-1" to the claim "there's no number smaller than 0" to someone who's not familiar with math, the author is saying "build-dependent macros" to the claim "compilers are deterministic" to someone who might not be familiar with compilers.
But usually the realization follows the initial intent by several weeks, if not months! Your comment shines as the embodiment of hindsight is 20/20.
But that's exactly what I don't get. How can that be considered "accidental"? How can any thinking person not realize that putting the build time into the compiled image will make every build different because, you know, different builds happen at different times? Has software engineering really been dumbed down so much that this is not immediately obvious? It feels like a mechanic doing an oil change and being surprised by having all the oil drain out if they neglect to put the drain plug back in.
In a parallel non–Euclidean dimension, perpetrators go the other way and have their victims build with -j1 reproducible builds.
You might accidentally end up including it transitively and suddenly your binary is nondeterministic.
Isn't that true for web frameworks too? Usually they'll only target unix, but if they target windows and macos, then they work on those platforms too? Or am I misunderstanding what you're trying to say here?
If you update the OS, hardware, or compiler, you will see only few changes. If you update the web framework, you may see breakages, API deprecations or whatever. You may want to move to a different web framework entirely. TBH I don't really know, I don't know web programming beyond basic HTML/Javascript. That's what they say, though.
In the case of an desktop application, unless you build things against OS libraries, your "platform" is also typically a framework, like QT or AppKit or whatever you end up using. That's the equivalent of the "web framework" in the web world.
Basically, it goes "Your app > GUI framework > other/OS libraries" for desktop apps, "Your app > web framework > other/OS libraries" for web applications.
Then in both approaches you can of course skip the framework if you want, no one is forcing you to use those in either of the cases.
Edit: I realize now we might be talking past each other, I was under the understanding that "web framework" is about backend web frameworks, but maybe you actually meant frontend frameworks running client-side. If so, replace "other/OS libraries" with "browser runtime" and my comment more or less still makes sense :)
That's not what I consider "low level programming". I don't use any of these.
Yes you can do try and do plain Javascript. Honestly Javascript is a much less pleasurable environment than a compiled statically typed procedural language. The main advantage of the browser is you get a viewport, you get font rendering etc. with almost no setup required at all.
So say C linking to Xorg-libraries and drawing GUI that way isn't low level programming, then what is? Only assembly is "low level programming" or what?
Meh, JavaScript is fine, like most dynamic Algol/C-like languages. Could be worse, could be TypeScript :)
But personally, browser environment is a hell of a lot easier to target than doing cross-platform native application development, but I'm a web developer who started doing native apps, not the other way around, might be why.
1. Allows access in reasonable time/battery use to me on my phone
2. Poses any meaningful challenge to the most compute-resourced organizations on the planet
I wonder how many cumulative hours of human life have been wasted waiting on Anubis.
I disagree with a lot of the decisions around the design of Anubis... but resisting the current drive of the industry to ruin as much of the good faith resource donations from others is an admirable objective.
The point isn't to increase the amount of work required to the point of exhaustion, it's to require that scripts be able to offer the exact same feature set that browsers offer. The point isn't to make it impossible, it's too make it more expensive than free.
Anubis isn't trying to prevent all scraping, it's trying to reduce the abuse just enough that real requests get their fair share. You don't need to outcompute the botnet just slow them down a little.
I hate seeing the Anubis interstitial too, I've complained about it publicly already too. But it doesn't come close to the frustration of waiting 10s for an SPA to load all of the routes it'll never use before the first redraw. Clearly our industry has also decided latency is a good thing.
"How dare that mugging victim fight back".
The choice is not between Anubis and no Anubis, the choice is between Anubis and my website going offline because I can't afford the $400/month that AI scrapers would cost me (yes, I checked, and yes, that's the real figure) if Anubis wasn't in front.
If you then spam requests you might get another, harder, hallenge appear.
If you have a data center IP and look like bot traffic you get a hard challenge out the gate.
AFAIU after looking at their docs several months ago.
https://xeiaso.net/blog/2024/much-ado-about-nothing/
Compilers literally made your project possible!
I would consider that a bug tbh
And it may not have crossed their mind that the clang behavior is a bug after finding a workaround. I'd also assume compilers do things "no mere mortal can fully comprehend on their own".
I'll go file it upstream after work today.
llvm/test/CodeGen/WebAssembly/cfg-stackify-eh.ll and friends are existing tests that you can kinda mangle if you want to get a good reproducer.
Also take a look at https://discourse.llvm.org/t/reverse-iteration-bots/72224
Otherwise, happy to put my reproducer/patch on the bug after you file it!
The internal programming guide also says which collections to use for deterministic iteration order: https://llvm.org/docs/ProgrammersManual.html#llvm-adt-setvec...
So definitely a bug here.
https://reproducible-builds.org/docs/source-date-epoch/
(although Nix sets it as a default)
And I speak as being generally very critical of cryptos, but here rewarding the website owner with some cents to have access seems fair, and resolves the traditional issues about micro-payments.
Wasn't there some famous home-computing project that recently stopped because of that? I thought it was Folding@home but that seems to still be going.
We see this with Recaptcha where when it was first launched, some news sites praised it as making good use of what would have otherwise been wasted human effort. But eventually I started to see negative comments along the lines of how Recaptcha is just extracting free work to train self driving cars, nevermind the part about stopping bots. Since Recaptcha is now sometimes non-interactive, I am not sure if that data is still used for training, other than to improve Recaptcha itself, but the negative sentiment still holds whether that data is used or not.
P.S. Great to see you, omoikane
Were Anubis to add crypto mining, even if all the revenue went to Techaro, you could still say "the enshittification is a shame, but at least they're not Google". Using the compute for BOINC protein folding somehow should be unobjectionable.
If it mined crypto instead of just burn clock cycles, then that could not in any way serve to lower its ranking in my book. It's already at minimum.
That being said I don't know any crypto that would technically fit as a lightweight PoW.
https://hacks.mozilla.org/2020/02/securing-firefox-with-weba...
Do people really do that? -- disable, not just using old browsers with no wasm.
Disabling wasm while keeping js enabled is a configuration i can't understand
https://xeiaso.net/characters/
But just taking this as-is, what is the environmental impact likely to be when multiplied up by the number of users? Proof of work is a bad idea.
Getting in the maze influences your client's challenge difficulty.
The README itselfs admit that this is an nuclear option. https://github.com/TecharoHQ/anubis
> I decided to take inspiration from the legendary talk The Birth and Death of JavaScript and just recompile the WebAssembly to JavaScript.
So what do you do when the client has Javascript disabled ?
Here, since any whatwg cartel web engine is an issue, the author should not bother.
I for one enjoyed the article and understand what you're getting at.
This is true but doesn't seem relevant; does replacing the word "compiler" with "build chain" change anything? Because that seems like the clear meaning.
If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
If you want to have users trust that someone else hasn't modified it, then sign it with your identity.
Being able to reproduce the binary from the source code and being able to verify that it's the same as the original is quite important in some contexts.
I disagree. The contexts that people come up with are purely theoretical, and are not practically important. Please do try and convince me otherwise by sharing such a context. From my view the juice of trying to accomplish this is no where worth the squeeze.
Military context: a government would want to review the code and compile themselves. Provide a hash of the target binary to ensure they've compiled it correctly.
SDLC: provide auditors with _proof_ that the tested binary is indeed coming from the audited code
The government doesn't want to do this. A lot of the time the government doesn't even get the source code in the first place.
>provide auditors with _proof_ that the tested binary is indeed coming from the audited code
This can be done by showing to the auditor how one's CI is setup to build checked in code and sign it.
SDLC: Traceability is more important than reproducibility. Keeping logs is more important than deterministic build outputs
Why not build your own binaries and be done with that. If you don’t trust the compiler or the machine doing the build, just build the code yourself.
Also useful for checking that a binary containing GPLed code does actually correspond to its published source.
That tooling is a compiler. The higher level, the better chance the LLM can be steered to good output. Machine code is hopeless, don’t bother.
Just like the difference between 'him' and 'her' is inscrutable taken out of context, but that's why LLMs have embeddings they use to store contextual information in huge vectors and have an input processing phase during which the input tokens gain contextual information, so that the LLM knows that 'him' refers to 'Peter' and 'her' refers to 'Jane'. Likewise it will be able to infer that $+15 is the 'success' branch of control flow and $+16 is the fail branch.
The way computer programs and natural language differ, is that in language, words with absolute or at least very constrained meanings are common, while code, is basically a pure manipulation of symbols, with variable and function names being meaningless helpers, and the actual meaning needs to be deduced from the way these symbols are manipulated.
In fact, I think LLMs are actually surprisingly good at this kind of abstract symbol manipulation, and are far less bothered than humans with 'add rax, rcx' by the fact that the meaning of 'rax' and 'rcx' are heavily contextual, as they dedicate a lot of time to build up rich contextual information that might be different in every place these symbols appear.
The context is pretty flexible, like "Do you know Jim? I saw him at the store." Or, "Do you know Jim? Fifteen days ago, I saw him at the store." There’s a relatively small universe of pronouns (him, her, that, who, etc) and the pronouns refer to a token nearby (in this case, Jim).
With machine code, there’s a massive set of jump offsets, and the referent isn’t a token, but rather a location to start processing.
> In fact, I think LLMs are actually surprisingly good at this kind of abstract symbol manipulation,
When you’re manipulating machine code, you’ve stepped away from abstract symbol manipulation and you’re just manipulating byte values now.
I don’t think your argument here is convincing. Maybe you can point to a demo or some architecture where this works. But my sense is this—once you start designing a harness to make LLMs capable of writing machine code, or designing an architecture for LLMs to write machine code, something in your implementation probably looks like an assembler, and something in your internal tokenization of the machine code probably looks like a higher-level language.
Also there are dynamic compilers were the shape of machine code changes as the code executes, and each single execution will certainly generate different sequences, depending on the program execution and where it is running.
Deterministic JIT compiler code generation, at least on optimising ones, is not a solved problem.
You can have LLMs help you optimize code but I don’t think you can do this unattended for non-trivial code.
I don't see why that's the case. LLM trained on binary would totally see it, not?
Also the tool can also be running the test and a debugger.
It would not. You find the correct version by counting the number of bytes to the destination. LLMs are famously bad at this kind of problem (counting).
> Also the tool can also be running the test and a debugger.
The test needs to provide a good amount of signal. That’s too hard if you are throwing machine code at the wall.
In order for debuggers to work, you need some kind of model that describes what the code should do and what state the computer should be in after each instruction. That model is high-level code.
I can understand the intuitive appeal of training LLMs with machine code, but all of my experience with LLMs suggest that they are incredibly ill-suited to the task, and we just don’t have the capacity to train them to make useful machine code.
It then means you have 2 parties focussing on the big picture and no one focussing on the details.
Yesterday on a whim I tried asking a local model a question about kanji that look different in different fonts despite being the same character (to the point of strokes appearing in completely different directions), and the model hallucinated imgur links to images of the characters. If imgur could work with approximate references to data maybe that would have worked.
It applies to humans too. Calculus is “simple” but it takes something like sixteen years to train a human to do it, if all goes well. Meanwhile, most humans think that inverse kinematics is, like, the easiest thing in the world (it’s a super complicated task).
You’re only evaluating “harder” or “easier” based on the perspective of somebody who has a mammalian brain with millions of years of selective pressure to make it suitable for solving inverse kinematics problems.
The point here is that when we start constructing agents or tools with different architectures to ourselves, it makes sense to reevaluate notions of whether something is ‘hard’ or ‘easy’. LLMs are bad at counting not because counting is hard, but because their architecture makes it hard.
Also, I suspect you're comparing dissimilar things, because in one case you're looking at a brain doing both inverse kinematics and "calculus" (sense 1), and in the other you're looking at a computer doing both inverse kinematics and "calculus" (sense 2). The kind of calculus a CAS does is not the same kind that a human does. It's less versatile, for one.
>The point here is that when we start constructing agents or tools with different architectures to ourselves, it makes sense to reevaluate notions of whether something is ‘hard’ or ‘easy’.
Well, no, because when someone says that calculus is hard and moving their arms is easy, they're not talking about how hard it was to create each functionality, they're talking about how hard it is to employ each. We would need to ask a computer how hard it thinks the tasks it does are to do.
I don’t think the metric is at all reasonable, and the fact that it’s “objective” doesn’t make up for its other shortcomings. I don’t think we have a basis for agreement here—I think you’ve framed the argument in a way that supports a “calculus is hard” conclusion merely by defining “hard” in such a way that supports your conclusion from the start, but I think that approach is only useful as a way to win an argument, and we’ve failed to share ideas once you start using that tactic.
It seems to me you're the one who first did that by equivocating what is easier to do and what is easier to make a machine do.
>we’ve failed to share ideas once you start using that tactic
Well, I certainly don't agree with that.