Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment (self.__VINEXT_RSC_CHUNKS__=self.__VINEXT_RSC_CHUNKS__||[];self.__VINEXT_RSC_CHUNKS__.push("2:I[\"aadde9aaef29\",[],\"default\",1]\n3:I[\"6e873226e03b\",[],\"Children\",1]\n5:I[\"bc2946a341c8\",[],\"LayoutSegmentProvider\",1]\n6:I[\"6e873226e03b\",[],\"Slot\",1]\n7:I[\"3506b3d116f7\",[],\"ErrorBoundary\",1]\n8:I[\"a9bbde40cf2d\",[],\"default\",1]\n9:I[\"3506b3d116f7\",[],\"NotFoundBoundary\",1]\na:\"$Sreact.suspense\"\n:HL[\"/assets/index-BLEkI_5r.css\",\"style\"]\n")target="_blank">gdm-tipsv2.github.io)

23 points by gmays 4 days ago | 1 comment

jiggawatts 4 days ago [-]

I just tested their online demo with a challenging photo of a snowboarder in dark clothing in front of a dark forest. The low contrast makes it difficult to distinguish their black helmet against the shadowed trees immediately behind and around it.

Dinov3 segmented this perfectly, as good as a human might, TIPSv2 cut the head off and marked it with the same PCA values as the forest. Similarly, TIPSv2 "split" the snow in the foreground into two different PCA values despite it being visually (and physically) contiguous and not significantly distinct.

Rendered at 14:58:56 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.