Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Decoupled DiLoCo: Resilient, Distributed AI Training at Scale (self.__VINEXT_RSC_CHUNKS__=self.__VINEXT_RSC_CHUNKS__||[];self.__VINEXT_RSC_CHUNKS__.push("2:I[\"aadde9aaef29\",[],\"default\",1]\n3:I[\"6e873226e03b\",[],\"Children\",1]\n5:I[\"bc2946a341c8\",[],\"LayoutSegmentProvider\",1]\n6:I[\"6e873226e03b\",[],\"Slot\",1]\n7:I[\"3506b3d116f7\",[],\"ErrorBoundary\",1]\n8:I[\"a9bbde40cf2d\",[],\"default\",1]\n9:I[\"3506b3d116f7\",[],\"NotFoundBoundary\",1]\na:\"$Sreact.suspense\"\n:HL[\"/assets/index-BLEkI_5r.css\",\"style\"]\n")"_blank">deepmind.google)

38 points by metadat 6 hours ago | 4 comments

SilverElfin 5 hours ago [-]

Is this actually innovative? I respect that there’s a lot of work in making it reality and doing it specifically for AI training by modifying their algorithms. But doing portions of work in clusters that are far apart and combining them has been done many times before for non AI things, right? Or so I would think.

Centigonal 4 hours ago [-]

You're right: the MapReduce pattern is very old, and it is well-known that applying it to AI training to enable geographically distributed training runs would be very beneficial. We haven't done it yet because model training workloads are more difficult to parallelize with high intra-node latency than a lot of traditional workloads.

This paper proposes a work partitioning scheme that removes a constraint that makes parallelizing AI training inefficient. The idea of a work partitioning scheme isn't novel, but the scheme itself is.

philipkglass 5 hours ago [-]

Generically speaking, yes, this has been done before. But it can take a lot of work to transform software that works with shared memory or other low-latency interprocess communication mechanisms so that it's practical to run across wide area networks. Sometimes that's not possible at all. Certain problems still require "high performance computing" architectures with all of their compute nodes in the same building, connected by high-bandwidth, low-latency links.

dren_18fc9_5">

SubiculumCode 57 minutes ago [-]

This is potentially scary, national security wise

Rendered at 22:31:13 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.