OpenReward @OpenReward

where machines get reward openreward.ai Joined January 2026

Tweets

31
Followers

118
Following

5
Likes

8

Xiangyi Li @xdotli

2 weeks ago

SkillsBench is now among the top environments on @OpenReward with 32k tool calls!

0 3 13 1K 4

View Details

We built AstaBench to give the field a shared, transparent way to measure whether AI can do rigorous scientific work. We’re pleased to see adoption with the @AISecurityInst via Inspect Evals and @GenReasoning, which added an AstaBench task to OpenReward.

1 1 3 1K 0

View Details

General Reasoning @GenReasoning

2 months ago

🎉 We're now supporting the Agent Data Protocol as a default agentic trajectory format. Any trajectories you log to @OpenReward can be exported in the ADP format. Thanks to @gneubig @yueqi_song for the collaboration!

0 11 43 17K 18

View Details

OpenReward @OpenReward

2 months ago

🧪 We’re experimenting with new features that allow for easier sampling with popular agentic harnesses. Core use cases: - Collecting diverse agentic midtraining data - Evaluating the latest models on agentic environments Try it out!

General Reasoning @GenReasoning

2 months ago

🔥🐴 Firehorse. Run any model with any harness on any @OpenReward environment. ⚖️ Evaluate the latest models on environment endpoints. 🗂️ Collect agentic data for midtraining and SFT from open models. 🧪 Early experimental library. More support soon. Link below.

3 7 34 5K 21

0 0 3 246 0

View Details

OpenReward @OpenReward

2 months ago

Try it out on OpenReward: openreward.ai/GeneralReasoni…

General Reasoning @GenReasoning

2 months ago

🎲 Introducing KellyBench, a new long-horizon evaluation for frontier models. KellyBench evaluates models within a year long sports betting market, a challenging and highly non-stationary environment. Every frontier model we test loses money. They struggle to design ML

25 49 638 159K 425

0 0 4 211 0

View Details

OpenReward @OpenReward

2 months ago

You can now train on OpenReward environments with SkyRL! Amazing work by @tyfeng1997 🙇

Ty Feng @tyfeng1997

2 months ago

Recently, I integrated @OpenReward into SkyRL (@NovaSkyAI), including an example demonstrating training with @modal. To verify the code, I ran several experiments—which proved to be a highly enriching experience! 😋 github.com/NovaSky-AI/Sky…

1 1 16 1K 7

0 0 2 146 0

View Details

ƬⲘ @tm23twt

2 months ago

timelapse 27 :) - submitted the rust reasoning algo env to meta rl hack, (actually built a python then moved to the rust one) created rust dataset around 1000 problems will make it next to 2.5k - define the whole reward logic not the optimal i think designed the way validation works, will refine it & push to @PrimeIntellect & @OpenReward envs. - have some other tasks as well, deadline is Tomorrow so need to finish this - this week was a pretty rough like peak locked in, so will chill & and just relax for few days

6 1 28 697 2

View Details

OpenReward @OpenReward

2 months ago

Claude Mythos Preview on SWE-Bench Pro appears to be a step change.

0 0 0 64 1

View Details

OpenReward @OpenReward

2 months ago

Congrats to @Zai_org team, new SOTA on SWE-Bench Pro! openreward.ai/GeneralReasoni…

Z.ai @Zai_org

2 months ago

Introducing GLM-5.1: The Next Level of Open Source - Top-Tier Performance: #1 in open source and #3 globally across SWE-Bench Pro, Terminal-Bench, and NL2Repo. - Built for Long-Horizon Tasks: Runs autonomously for 8 hours, refining strategies through thousands of iterations.

545 1K 11K 4.3M 4K

1 2 6 1K 1

View Details

Parshin Shojaee @ParshinShojaee

2 months ago

great to see our llm-srbench featured in openreward! super exciting collection of science environments for agents!!

General Reasoning @GenReasoning

2 months ago

🌍 Environments of the Week The theme this week...environments for science 👩‍🔬. First up, LLM-SR Bench by @ParshinShojaee et al is an environment for evaluating language model agents on scientific equation discovery tasks. openreward.ai/parshinsh/llms…

1 6 26 6K 14

2 6 29 4K 5

View Details

General Reasoning @GenReasoning

2 months ago

1 6 26 6K 14

View Details

General Reasoning @GenReasoning

2 months ago

Run YC-Bench from @CollinearAI on OpenReward 👇

Muyu He @HeMuyu0327

2 months ago

We've had a lot of fun building this benchmark (asking LLMS to run a startup), which gives the clearest signal on LLMs' "long-term coherence" ability. We observe that frontier models have significant variance on this benchmark, showing that long-term execution is still

5 6 28 5K 7

1 4 16 4K 5

View Details

General Reasoning @GenReasoning

2 months ago

🪐 Researcher Credits We’re announcing researcher credits for OpenReward: helping researchers develop the next generation of environments and evaluations. Read more and apply below. gr.inc/releases/resea…

1 10 64 12K 53

View Details

Dimitris Papailiopoulos @DimitrisPapail

2 months ago

@gandhikanishk's Endless Terminals is the most popular env on OpenReward!

General Reasoning @GenReasoning

2 months ago

🌍 Environments of the Week It's been a week since we launched @OpenReward. Here are some of our favourite environments this week - some newly added, some heavily used, and some hidden gems. First, the most used environment of the week is EndlessTerminals by @gandhikanishk with

3 11 54 10K 31

1 7 33 5K 5

View Details

General Reasoning @GenReasoning

2 months ago

3 11 54 10K 31

View Details

Shashwat Goel @ShashwatGoel7

2 months ago

Cool idea from @AashaySachdeva: unified environment interfaces like @OpenReward can enable LLM meta-learning research! Pleased with where things are going with more parts of the stack accessible publically. For e.g. I now look forward to weekly @tinkerapi roundups as much as John Oliver episodes!

aashay sachdeva @AashaySachdeva

2 months ago

3 7 66 15K 60

4 6 27 11K 22

View Details

aashay sachdeva @AashaySachdeva

2 months ago

Played around with this. This was exactly something I was looking for! Tried a few things - Creating an env - pretty dope! end to end claude was able to port it from github with only minor issues. One shotted @ShashwatGoel7 OpenForecaster env here. A lot more people should contribute their own envs. I hope they launch monetisation here. Running a curator over env tasks during RL - When there are so many tasks, which one should you focus on? This is the auto-curriculum/meta-learning bit. I am still not able to beat random/pass@k but I think signals are there over long run this will help with diversity. This obviously has a power law, every run will have top envs dominating but I feel those 20% random tasks will give a big boost to any model. optimise the GEPA optimiser - gepa is great but pretty slow. What if we could teach a model to do this better? This was in my list for so long, finally with openreward was able to attempt it.

General Reasoning @GenReasoning

3 months ago

Introducing OpenReward. 🌍 330+ RL environments through one API ⚡ Autoscaled sandbox compute 🍒 4.5M+ unique RL tasks 🚂 Works like magic with Tinker, Miles, Slime Link and thread below.

26 192 1K 244K 1K

3 7 66 15K 60

View Details

Xiangyi Li @xdotli

2 months ago

.@benchflow_ai started in 09/24 as unity for benchmarks and a hosting hub with early users from Stanford and Princeton. 4 months before R1 dropped We stopped after 9 months with 0 traction. Today our latest work SkillsBench is #1 trending on @OpenReward. Game of eval is just on

1 6 21 3K 3

View Details

Nikhil Chandak @nikhilchandak29

2 months ago

Cool to see OpenForecaster environment trending on @OpenReward. Thanks @AashaySachdeva for porting it!

General Reasoning @GenReasoning

3 months ago

Introducing OpenReward. 🌍 330+ RL environments through one API ⚡ Autoscaled sandbox compute 🍒 4.5M+ unique RL tasks 🚂 Works like magic with Tinker, Miles, Slime Link and thread below.

26 192 1K 244K 1K

0 5 26 3K 3

View Details

Tinker @tinkerapi

2 months ago

OpenReward serves hundreds of RL environments through a single API with autoscaled compute. Plug into Tinker to train agents on millions of tasks from anywhere. x.com/GenReasoning/s…

General Reasoning @GenReasoning

3 months ago

🤝 OpenReward is interoperable with any training library. Here we use the SETA environment by @Eigent_AI. We use @tinkerapi for model compute and @OpenReward for environment compute. This allows you to run agentic RL training from a laptop. github.com/OpenRewardAI/o….