digitalcourage.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Diese Instanz wird betrieben von Digitalcourage e.V. für die Allgemeinheit. Damit wir das nachhaltig tun können, erheben wir einen jährlichen Vorausbeitrag von 1€/Monat per SEPA-Lastschrifteinzug.

Server stats:

812
active users

#reinforcementlearning

2 posts2 participants0 posts today
Travis F W<p><span class="h-card" translate="no"><a href="https://tldr.nettime.org/@remixtures" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>remixtures</span></a></span> yeah the LLMs we know are not trained with <a href="https://fosstodon.org/tags/reinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementLearning</span></a> afaict. Good question, actually.</p>
IT News<p>Is AI really trying to escape human control and blackmail people? - In June, headlines read like science fiction: AI models "bla... - <a href="https://arstechnica.com/information-technology/2025/08/is-ai-really-trying-to-escape-human-control-and-blackmail-people/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/information-te</span><span class="invisible">chnology/2025/08/is-ai-really-trying-to-escape-human-control-and-blackmail-people/</span></a> <a href="https://schleuss.online/tags/goalmisgeneralization" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>goalmisgeneralization</span></a> <a href="https://schleuss.online/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://schleuss.online/tags/largelanguagemodels" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>largelanguagemodels</span></a> <a href="https://schleuss.online/tags/alignmentresearch" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>alignmentresearch</span></a> <a href="https://schleuss.online/tags/palisaderesearch" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>palisaderesearch</span></a> <a href="https://schleuss.online/tags/aisafetytesting" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>aisafetytesting</span></a> <a href="https://schleuss.online/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>machinelearning</span></a> <a href="https://schleuss.online/tags/jeffreyladish" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>jeffreyladish</span></a> <a href="https://schleuss.online/tags/generativeai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>generativeai</span></a> <a href="https://schleuss.online/tags/aialignment" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>aialignment</span></a> <a href="https://schleuss.online/tags/aideception" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>aideception</span></a> <a href="https://schleuss.online/tags/claudeopus4" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>claudeopus4</span></a> <a href="https://schleuss.online/tags/aibehavior" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>aibehavior</span></a> <a href="https://schleuss.online/tags/airesearch" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>airesearch</span></a> <a href="https://schleuss.online/tags/o3model" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>o3model</span></a></p>
The Internet is Crack<p>Intelligence Is a Gray Area</p><p>Professor Michael Littman joins The Internet Is Crack to unpack AI and reinforcement learning—and challenge what we really mean when we say a machine is “intelligent.”</p><p>🎧 <a href="https://youtu.be/N3TpwsMVeRg" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">youtu.be/N3TpwsMVeRg</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ArtificialIntelligence</span></a> <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://mastodon.social/tags/TechPodcast" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechPodcast</span></a> <a href="https://mastodon.social/tags/EthicsInAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>EthicsInAI</span></a> <a href="https://mastodon.social/tags/podcast" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>podcast</span></a></p>
Sarah Lea<p>Want to truly understand how AI learns to make better decisions?<br>:blobcoffee: Start with the concept of exploration vs. exploitation and dive into one of the most basic, but super important, ideas in Reinforcement Learning: Multi-Armed Bandits. <br>:blobcoffee: It's a simpler start to understand the basics behind. And it might change how you see AI.<br>→<a href="https://towardsdatascience.com/simple-guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">towardsdatascience.com/simple-</span><span class="invisible">guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/</span></a></p><p><a href="https://techhub.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://techhub.social/tags/ki" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ki</span></a> <a href="https://techhub.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>machinelearning</span></a> <a href="https://techhub.social/tags/datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datascience</span></a> <a href="https://techhub.social/tags/DataScientist" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataScientist</span></a> <a href="https://techhub.social/tags/learning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>learning</span></a> <a href="https://techhub.social/tags/decision" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>decision</span></a></p>
The Internet is Crack<p>🎙️ Artificial Intelligence: What It Really Is (and Isn’t)</p><p>We had the pleasure of speaking with Prof. Michael Littman (Brown University) about the fundamentals of AI and reinforcement learning. We explore whether today’s AI systems are truly “intelligent”—or just faking it impressively well.</p><p>🎧 Listen here: <a href="https://youtu.be/N3TpwsMVeRg" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">youtu.be/N3TpwsMVeRg</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ArtificialIntelligence</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://mastodon.social/tags/TechPodcast" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechPodcast</span></a> <a href="https://mastodon.social/tags/OpenScience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenScience</span></a> <a href="https://mastodon.social/tags/STEM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>STEM</span></a> <a href="https://mastodon.social/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MachineLearning</span></a> <a href="https://mastodon.social/tags/FOSSfriendly" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FOSSfriendly</span></a> <a href="https://mastodon.social/tags/podcast" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>podcast</span></a></p>
Sarah Lea<p>Can you remember learning to walk as a baby? You didn’t read a manual. Neither does an AI agent.</p><p>Reinforcement Learning (RL) isn’t about knowing the correct answer.<br>It’s about learning through trial and error, by interacting with an environment &amp; receiving feedback.</p><p>That’s how AlphaGo defeated a world champion:<br>It first learned from expert games. Then it played against itself, millions of times, using RL to get better with each game. That’s how it mastered Go.</p><p><a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>machinelearning</span></a> <a href="https://techhub.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://techhub.social/tags/ki" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ki</span></a> <a href="https://techhub.social/tags/google" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>google</span></a> <a href="https://techhub.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://techhub.social/tags/alphago" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>alphago</span></a> <a href="https://techhub.social/tags/datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datascience</span></a> <a href="https://techhub.social/tags/datascientist" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datascientist</span></a></p>
Sarah Lea<p>Which strategy do you use when learning something new?</p><p>3 strategies AI agents use to learn what works:<br>:blobcoffee: Greedy: Stick with what has worked best so far.<br>:blobcoffee: ε-Greedy: Mostly stick with the best. But try something new every now and then.<br>:blobcoffee: Optimistic Start: Assume everything is great until proven otherwise.</p><p>They all come from something called the “Multi-Armed Bandit” problem.</p><p>But they show up in real life too:<br>→ Trying a new café.<br>→ Deciding what to study <br>→ Choosing which project to pursue at work.</p><p>Which one do you use most often?<br>And should you change it?</p><p>Curious to dive deeper? I covered both topics in my latest two articles: <a href="https://towardsdatascience.com/author/schuerch_sarah/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">towardsdatascience.com/author/</span><span class="invisible">schuerch_sarah/</span></a></p><p><a href="https://techhub.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://techhub.social/tags/KI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>KI</span></a> <a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>machinelearning</span></a> <a href="https://techhub.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://techhub.social/tags/learning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>learning</span></a> <a href="https://techhub.social/tags/datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datascience</span></a></p>
michabbb<p><a href="https://social.vivaldi.net/tags/ART" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ART</span></a> Agent Reinforcement Trainer: <a href="https://social.vivaldi.net/tags/Opensource" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Opensource</span></a> <a href="https://social.vivaldi.net/tags/RL" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RL</span></a> <a href="https://social.vivaldi.net/tags/framework" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>framework</span></a> for building reliable <a href="https://social.vivaldi.net/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> agents 🤖</p><p>🎯 Improved email agent success rate from 74% to 94% using <a href="https://social.vivaldi.net/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> on <a href="https://social.vivaldi.net/tags/Qwen" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Qwen</span></a> model 💰 Reduced costs from $55 to $0.80 per 1,000 requests 🧵👇</p>
Sarah Lea<p>Reinforcement Learning starts with a simple but powerful idea:<br>Trial &amp; Error. Learning what works.</p><p>The Multi-Armed Bandit problem is a first step into this world.<br>It's not just about slot machines. Iit's about how AI (and humans) learn to choose.</p><p><a href="https://techhub.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://techhub.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://techhub.social/tags/CognitiveScience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CognitiveScience</span></a> <a href="https://techhub.social/tags/Psychology" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Psychology</span></a> <a href="https://techhub.social/tags/Behavior" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Behavior</span></a> <a href="https://techhub.social/tags/DecisionMaking" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DecisionMaking</span></a> <a href="https://techhub.social/tags/Bandits" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Bandits</span></a> <a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>machinelearning</span></a> <a href="https://techhub.social/tags/KI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>KI</span></a> <a href="https://techhub.social/tags/Datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Datascience</span></a> <a href="https://techhub.social/tags/datascientist" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datascientist</span></a></p>
Sarah Lea<p>Do you always go to the same café? Or do you try something new?</p><p>That’s the exploration vs. exploitation dilemma: Decision under uncertainty.</p><p>Multi-armed bandits model exactly that.</p><p>And this dilemma shows up everywhere: Recommender systems, A/B tests, online ads, even in human psychology.</p><p>Nobel Prize winner Daniel Kahneman called this one of the most fundamental cognitive patterns.</p><p>I explain what it is, why it matters, and how AI systems handle it. 🎰</p><p>:blobcoffee: Full article here: <a href="https://towardsdatascience.com/simple-guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">towardsdatascience.com/simple-</span><span class="invisible">guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/</span></a></p><p><a href="https://techhub.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://techhub.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://techhub.social/tags/CognitiveScience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CognitiveScience</span></a> <a href="https://techhub.social/tags/Kahneman" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Kahneman</span></a> <a href="https://techhub.social/tags/Psychology" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Psychology</span></a> <a href="https://techhub.social/tags/Behavior" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Behavior</span></a> <a href="https://techhub.social/tags/DecisionMaking" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DecisionMaking</span></a> <a href="https://techhub.social/tags/Bandits" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Bandits</span></a> <a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>machinelearning</span></a> <a href="https://techhub.social/tags/KI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>KI</span></a> <a href="https://techhub.social/tags/Datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Datascience</span></a> <a href="https://techhub.social/tags/datascientist" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datascientist</span></a></p>
N-gated Hacker News<p>🤓 Ah, yes, the classic "let's scale reinforcement learning algorithms to mind-boggling <a href="https://mastodon.social/tags/FLOPs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FLOPs</span></a> and expect something magical" pitch. 🚀 Apparently, all it takes is sprinkling some next-token prediction dust on the entire Internet, and voilà! Genius-level <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a>, because clearly, the web is a treasure trove of high-quality reasoning. 🙄<br><a href="https://blog.jxmo.io/p/how-to-scale-rl-to-1026-flops" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">blog.jxmo.io/p/how-to-scale-rl</span><span class="invisible">-to-1026-flops</span></a> <a href="https://mastodon.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://mastodon.social/tags/magic" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>magic</span></a> <a href="https://mastodon.social/tags/techinnovation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>techinnovation</span></a> <a href="https://mastodon.social/tags/mindbending" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>mindbending</span></a> <a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/ngated" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ngated</span></a></p>
N-gated Hacker News<p>In a world where "easy" means "convoluted" and "any" means "almost nothing," comes the groundbreaking revelation that you can now slap a shiny label on your agent and call it Reinforcement Learning™️. 🤖✨ Just sprinkle some <a href="https://mastodon.social/tags/buzzwords" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>buzzwords</span></a> and watch your productivity plummet! 🐌📉<br><a href="https://openpipe.ai/blog/ruler" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">openpipe.ai/blog/ruler</span><span class="invisible"></span></a> <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://mastodon.social/tags/ProductivityTech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ProductivityTech</span></a> <a href="https://mastodon.social/tags/ConvolutedSolutions" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ConvolutedSolutions</span></a> <a href="https://mastodon.social/tags/AITrends" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AITrends</span></a> <a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/ngated" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ngated</span></a></p>
Erik Jonker<p>Good article how reinforcement learning improved current AI models. Also illustrates that LLMs today are not just imitating.<br><a href="https://arstechnica.com/ai/2025/07/how-a-big-shift-in-training-llms-led-to-a-capability-explosion/?utm_brand=arstechnica&amp;utm_social-type=owned&amp;utm_source=mastodon&amp;utm_medium=social" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/ai/2025/07/how</span><span class="invisible">-a-big-shift-in-training-llms-led-to-a-capability-explosion/?utm_brand=arstechnica&amp;utm_social-type=owned&amp;utm_source=mastodon&amp;utm_medium=social</span></a><br><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a></p>
N-gated Hacker News<p>🤖✨ Welcome to the parallel universe of "Understanding AI," where talking about reinforcement learning with "minimal math and jargon" actually means repeating the same phrase a dozen times until it loses all meaning. 🎓🔍 Because who needs <a href="https://mastodon.social/tags/clarity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>clarity</span></a> when you can drown your audience in <a href="https://mastodon.social/tags/buzzwords" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>buzzwords</span></a> and call it a "deep dive"? 🌀📚<br><a href="https://www.understandingai.org/p/reinforcement-learning-explained" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">understandingai.org/p/reinforc</span><span class="invisible">ement-learning-explained</span></a> <a href="https://mastodon.social/tags/UnderstandingAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>UnderstandingAI</span></a> <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://mastodon.social/tags/DeepDive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DeepDive</span></a> <a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/ngated" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ngated</span></a></p>
Assn for Computing Machinery<p>"Intelligence is figuring out how the world works rather than waiting for someone to tell you how the world works."</p><p>Join us as we hear from Andrew Barto and Richard Sutton, the 2024 <a href="https://mastodon.acm.org/tags/ACMTuringAward" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ACMTuringAward</span></a> recipients as they discuss their work on <a href="https://mastodon.acm.org/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a>.</p><p><a href="https://vimeo.com/1085726612" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">vimeo.com/1085726612</span><span class="invisible"></span></a></p>
Python Weekly 🐍<p>This Python class offers a multiprocessing-powered Pool for efficiently collecting and managing experience replay data in reinforcement learning.</p><p><a href="https://github.com/NoteDance/Pool" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">github.com/NoteDance/Pool</span><span class="invisible"></span></a></p><p>Discussions: <a href="https://discu.eu/q/https://github.com/NoteDance/Pool" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">discu.eu/q/https://github.com/</span><span class="invisible">NoteDance/Pool</span></a></p><p><a href="https://mastodon.social/tags/programming" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>programming</span></a> <a href="https://mastodon.social/tags/python" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>python</span></a> <a href="https://mastodon.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a></p>
Compsci Weekly<p>[D] My first blog, PPO to GRPO</p><p><a href="https://medium.com/@opmyth/from-ppo-to-grpo-1681c837de5f" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">medium.com/@opmyth/from-ppo-to</span><span class="invisible">-grpo-1681c837de5f</span></a></p><p>Discussions: <a href="https://discu.eu/q/https://medium.com/%40opmyth/from-ppo-to-grpo-1681c837de5f" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">discu.eu/q/https://medium.com/</span><span class="invisible">%40opmyth/from-ppo-to-grpo-1681c837de5f</span></a></p><p><a href="https://mastodon.social/tags/compsci" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>compsci</span></a> <a href="https://mastodon.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>machinelearning</span></a> <a href="https://mastodon.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a></p>
Dr. Carlotta A. Berry, PhD<p><a href="https://blacktwitter.io/tags/BlackInRobotics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BlackInRobotics</span></a> workshop series <a href="https://blacktwitter.io/tags/ROS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ROS</span></a> <a href="https://blacktwitter.io/tags/ROS2" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ROS2</span></a> <a href="https://blacktwitter.io/tags/Robot" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Robot</span></a> <a href="https://blacktwitter.io/tags/Robotics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Robotics</span></a> <a href="https://blacktwitter.io/tags/STEM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>STEM</span></a> <a href="https://blacktwitter.io/tags/STEAM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>STEAM</span></a> <a href="https://blacktwitter.io/tags/BlackSTEM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BlackSTEM</span></a> <a href="https://blacktwitter.io/tags/BlackSTEAM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BlackSTEAM</span></a> <a href="https://blacktwitter.io/tags/Drone" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Drone</span></a> <a href="https://blacktwitter.io/tags/ComputerVision" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ComputerVision</span></a> <a href="https://blacktwitter.io/tags/Drones" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Drones</span></a> <a href="https://blacktwitter.io/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://blacktwitter.io/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MachineLearning</span></a> <a href="https://blacktwitter.io/tags/Neuralnetworks" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Neuralnetworks</span></a> <a href="https://blacktwitter.io/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://blacktwitter.io/tags/Learning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Learning</span></a></p>
Dr. Carlotta A. Berry, PhD<p><a href="https://blacktwitter.io/tags/BlackInRobotics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BlackInRobotics</span></a> workshop series <a href="https://blacktwitter.io/tags/ROS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ROS</span></a> <a href="https://blacktwitter.io/tags/ROS2" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ROS2</span></a> <a href="https://blacktwitter.io/tags/Robot" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Robot</span></a> <a href="https://blacktwitter.io/tags/Robotics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Robotics</span></a> <a href="https://blacktwitter.io/tags/STEM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>STEM</span></a> <a href="https://blacktwitter.io/tags/STEAM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>STEAM</span></a> <a href="https://blacktwitter.io/tags/BlackSTEM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BlackSTEM</span></a> <a href="https://blacktwitter.io/tags/BlackSTEAM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BlackSTEAM</span></a> <a href="https://blacktwitter.io/tags/Drone" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Drone</span></a> <a href="https://blacktwitter.io/tags/ComputerVision" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ComputerVision</span></a> <a href="https://blacktwitter.io/tags/Drones" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Drones</span></a> <a href="https://blacktwitter.io/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://blacktwitter.io/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MachineLearning</span></a> <a href="https://blacktwitter.io/tags/Neuralnetworks" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Neuralnetworks</span></a> <a href="https://blacktwitter.io/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://blacktwitter.io/tags/Learning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Learning</span></a></p>
5h15h<p><a href="https://techhub.social/tags/Promptengineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Promptengineering</span></a> is crucial for developing <a href="https://techhub.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a>-based apps, but it's often manual &amp; inefficient. PRewrite is an automated method using an LLM trained with <a href="https://techhub.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a> to optimize prompts <a href="https://arxiv.org/pdf/2401.08189" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/pdf/2401.08189</span><span class="invisible"></span></a> <a href="https://techhub.social/tags/RL" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RL</span></a> <a href="https://techhub.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a></p>