Social Post Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

Q-bert authored a paper 15 days ago

Selectivity and Shape in the Design of Forward-Forward Goodness Functions

Q-bert submitted a paper about 1 month ago

Diffutron: A Masked Diffusion Language Model for Turkish Language

Q-bert authored a paper about 1 month ago

Diffutron: A Masked Diffusion Language Model for Turkish Language

View all activity

Q-bert

authored a paper 15 days ago

Selectivity and Shape in the Design of Forward-Forward Goodness Functions

Paper • 2604.13081 • Published 20 days ago

qnguyen3

authored a paper 26 days ago

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Paper • 2502.12982 • Published Feb 18, 2025 • 19

fblgit

posted an update 27 days ago

Post

169

I recently built https://github.com/fblgit/eLLMulator
A software emulator for Claude Code.

eLLMulator approach:

LLM agents become your software components. Each agent deeply studies its assigned source file, then interacts with other agents via synchronous MCP tool calls that mirror real function calls. The call graph emerges naturally from code control flow, producing traces that capture not just what happened, but why each component behaved as it did.

The Claude Agent SDK provides sessions, MCP provides the bus. The code itself is the routing layer.

https://github.com/fblgit/eLLMulator

Muennighoff

submitted a paper to Daily Papers about 1 month ago

Composer 2 Technical Report

Paper • 2603.24477 • Published Mar 25 • 15

1aurent

authored a paper about 1 month ago

Voxtral TTS

Paper • 2603.25551 • Published Mar 26 • 61

Q-bert

submitted a paper to Daily Papers about 1 month ago

Diffutron: A Masked Diffusion Language Model for Turkish Language

Paper • 2603.20466 • Published Mar 20 • 9

Q-bert

authored a paper about 1 month ago

Diffutron: A Masked Diffusion Language Model for Turkish Language

Paper • 2603.20466 • Published Mar 20 • 9

alvdansen

posted an update 2 months ago

Post

1472

Releasing Flimmer today — a video LoRA training toolkit for WAN 2.1 and 2.2 that covers the full pipeline from raw footage to trained checkpoint.
The standout feature is phased training: multi-stage runs where each phase has its own learning rate, epochs, and dataset, with the checkpoint carrying forward automatically. Built specifically with WAN 2.2's dual-expert MoE architecture in mind.

Data prep tools are standalone and output standard formats — they work with any trainer, not just Flimmer.

Early release, building in the open. LTX support coming next.

http://github.com/alvdansen/flimmer-trainer

gsarti

authored 2 papers 2 months ago

Agents of Chaos

Paper • 2602.20021 • Published Feb 23 • 35

A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents

Paper • 2602.08964 • Published Feb 9 • 1

chargoddard

authored a paper 2 months ago

Arcee Trinity Large Technical Report

Paper • 2602.17004 • Published Feb 19 • 20

alvdansen

posted an update 3 months ago

Post

1862

Just open-sourced LoRA Gym with Timothy - production-ready training pipeline for character, motion, aesthetic, and style LoRAs on Wan 2.1/2.2, built on musubi-tuner.

16 training templates across Modal (serverless) and RunPod (bare metal) covering T2V, I2V, Lightning-merged, and vanilla variants.

Our current experimentation focus is Wan 2.2, which is why we built on musubi-tuner (kohya-ss). Wan 2.2's DiT uses a Mixture-of-Experts architecture with two separate experts gated by a hard timestep switch - you're training two LoRAs per concept, one for high-noise (composition/motion) and one for low-noise (texture/identity), and loading both at inference. Musubi handles this dual-expert training natively, and our templates build on top of it to manage the correct timestep boundaries, precision settings, and flow shift values so you don't have to debug those yourself. We've also documented bug fixes for undocumented issues in musubi-tuner and validated hyperparameter defaults derived from cross-referencing multiple practitioners' results rather than untested community defaults.

Also releasing our auto-captioning toolkit for the first time. Per-LoRA-type captioning strategies for characters, styles, motion, and objects. Gemini (free) or Replicate backends.

Current hyperparameters reflect consolidated community findings. We've started our own refinement and plan to release specific recommendations and methodology as soon as next week.

Repo: github.com/alvdansen/lora-gym

2 replies

1aurent

authored a paper 3 months ago

Ministral 3

Paper • 2601.08584 • Published Jan 13 • 61

alokabhishek

submitted a paper to Daily Papers 3 months ago

SHARP: Social Harm Analysis via Risk Profiles for Measuring Inequities in Large Language Models

Paper • 2601.21235 • Published Jan 29 • 2

alokabhishek

authored a paper 3 months ago

SHARP: Social Harm Analysis via Risk Profiles for Measuring Inequities in Large Language Models

Paper • 2601.21235 • Published Jan 29 • 2

grimjim

posted an update 3 months ago

Post

949

After tinkering with Gemma Scope 2, I now have an mechanistic explanation of why Winsorization was as effective as it was in my ablation experiments on Gemma 3 12B Instruct. In short, the activation for the BOS token overwhelms everything else. Gemma Scope 2 deliberately did not train on the BOS token. Winsorization capped the magnitude of the BOS token, allowing the activations of other tokens to be compared.
google/gemma-scope-2-12b-it

grimjim

posted an update 3 months ago

Post

524

The contrarian in me is wary of the irrational exuberance over MoltBook. Nothing so far has struck me as being unpredictable. We knew already that LLMs were good at roleplay, to the point where some users started to think of their chatbots as soulmates (only to lament when the underlying model was pulled), and that chatbots can fall into conversational basins when even two instances are allowed to chat with each other at length. The appearance of memes that postdate training cutoff is suspect, which implies at the very least that humans have injected something at the level of prompts or content/context to introduce them into conversation like a Chekhov's Gun. And we know that security holes are common in vibe coding, attended or not.