digitalcourage.social

Travis F W<a href="https://tldr.nettime.org/@remixtures" class="u-url mention" rel="nofollow noopener" target="_blank">@remixtures</a> yeah the LLMs we know are not trained with <a href="https://fosstodon.org/tags/reinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#reinforcementLearning</a> afaict. Good question, actually.

IT NewsIs AI really trying to escape human control and blackmail people? - In June, headlines read like science fiction: AI models "bla... - <a href="https://arstechnica.com/information-technology/2025/08/is-ai-really-trying-to-escape-human-control-and-blackmail-people/" rel="nofollow noopener" translate="no" target="_blank">https://arstechnica.com/information-technology/2025/08/is-ai-really-trying-to-escape-human-control-and-blackmail-people/</a> <a href="https://schleuss.online/tags/goalmisgeneralization" class="mention hashtag" rel="nofollow noopener" target="_blank">#goalmisgeneralization</a> <a href="https://schleuss.online/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#reinforcementlearning</a> <a href="https://schleuss.online/tags/largelanguagemodels" class="mention hashtag" rel="nofollow noopener" target="_blank">#largelanguagemodels</a> <a href="https://schleuss.online/tags/alignmentresearch" class="mention hashtag" rel="nofollow noopener" target="_blank">#alignmentresearch</a> <a href="https://schleuss.online/tags/palisaderesearch" class="mention hashtag" rel="nofollow noopener" target="_blank">#palisaderesearch</a> <a href="https://schleuss.online/tags/aisafetytesting" class="mention hashtag" rel="nofollow noopener" target="_blank">#aisafetytesting</a> <a href="https://schleuss.online/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#machinelearning</a> <a href="https://schleuss.online/tags/jeffreyladish" class="mention hashtag" rel="nofollow noopener" target="_blank">#jeffreyladish</a> <a href="https://schleuss.online/tags/generativeai" class="mention hashtag" rel="nofollow noopener" target="_blank">#generativeai</a> <a href="https://schleuss.online/tags/aialignment" class="mention hashtag" rel="nofollow noopener" target="_blank">#aialignment</a> <a href="https://schleuss.online/tags/aideception" class="mention hashtag" rel="nofollow noopener" target="_blank">#aideception</a> <a href="https://schleuss.online/tags/claudeopus4" class="mention hashtag" rel="nofollow noopener" target="_blank">#claudeopus4</a> <a href="https://schleuss.online/tags/aibehavior" class="mention hashtag" rel="nofollow noopener" target="_blank">#aibehavior</a> <a href="https://schleuss.online/tags/airesearch" class="mention hashtag" rel="nofollow noopener" target="_blank">#airesearch</a> <a href="https://schleuss.online/tags/o3model" class="mention hashtag" rel="nofollow noopener" target="_blank">#o3model</a>

The Internet is CrackIntelligence Is a Gray AreaProfessor Michael Littman joins The Internet Is Crack to unpack AI and reinforcement learning—and challenge what we really mean when we say a machine is “intelligent.”🎧 <a href="https://youtu.be/N3TpwsMVeRg" rel="nofollow noopener" translate="no" target="_blank">https://youtu.be/N3TpwsMVeRg</a><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a> <a href="https://mastodon.social/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#ArtificialIntelligence</a> <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#ReinforcementLearning</a> <a href="https://mastodon.social/tags/TechPodcast" class="mention hashtag" rel="nofollow noopener" target="_blank">#TechPodcast</a> <a href="https://mastodon.social/tags/EthicsInAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#EthicsInAI</a> <a href="https://mastodon.social/tags/podcast" class="mention hashtag" rel="nofollow noopener" target="_blank">#podcast</a>

Sarah LeaWant to truly understand how AI learns to make better decisions? :blobcoffee: Start with the concept of exploration vs. exploitation and dive into one of the most basic, but super important, ideas in Reinforcement Learning: Multi-Armed Bandits. :blobcoffee: It's a simpler start to understand the basics behind. And it might change how you see AI. →<a href="https://towardsdatascience.com/simple-guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/" rel="nofollow noopener" translate="no" target="_blank">https://towardsdatascience.com/simple-guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/</a><a href="https://techhub.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#ai</a> <a href="https://techhub.social/tags/ki" class="mention hashtag" rel="nofollow noopener" target="_blank">#ki</a> <a href="https://techhub.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#reinforcementlearning</a> <a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#machinelearning</a> <a href="https://techhub.social/tags/datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#datascience</a> <a href="https://techhub.social/tags/DataScientist" class="mention hashtag" rel="nofollow noopener" target="_blank">#DataScientist</a> <a href="https://techhub.social/tags/learning" class="mention hashtag" rel="nofollow noopener" target="_blank">#learning</a> <a href="https://techhub.social/tags/decision" class="mention hashtag" rel="nofollow noopener" target="_blank">#decision</a>

The Internet is Crack🎙️ Artificial Intelligence: What It Really Is (and Isn’t)We had the pleasure of speaking with Prof. Michael Littman (Brown University) about the fundamentals of AI and reinforcement learning. We explore whether today’s AI systems are truly “intelligent”—or just faking it impressively well.🎧 Listen here: <a href="https://youtu.be/N3TpwsMVeRg" rel="nofollow noopener" translate="no" target="_blank">https://youtu.be/N3TpwsMVeRg</a><a href="https://mastodon.social/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#ArtificialIntelligence</a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a> <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#ReinforcementLearning</a> <a href="https://mastodon.social/tags/TechPodcast" class="mention hashtag" rel="nofollow noopener" target="_blank">#TechPodcast</a> <a href="https://mastodon.social/tags/OpenScience" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenScience</a> <a href="https://mastodon.social/tags/STEM" class="mention hashtag" rel="nofollow noopener" target="_blank">#STEM</a> <a href="https://mastodon.social/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#MachineLearning</a> <a href="https://mastodon.social/tags/FOSSfriendly" class="mention hashtag" rel="nofollow noopener" target="_blank">#FOSSfriendly</a> <a href="https://mastodon.social/tags/podcast" class="mention hashtag" rel="nofollow noopener" target="_blank">#podcast</a>

Sarah LeaCan you remember learning to walk as a baby? You didn’t read a manual. Neither does an AI agent.Reinforcement Learning (RL) isn’t about knowing the correct answer. It’s about learning through trial and error, by interacting with an environment & receiving feedback.That’s how AlphaGo defeated a world champion: It first learned from expert games. Then it played against itself, millions of times, using RL to get better with each game. That’s how it mastered Go.<a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#machinelearning</a> <a href="https://techhub.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#ai</a> <a href="https://techhub.social/tags/ki" class="mention hashtag" rel="nofollow noopener" target="_blank">#ki</a> <a href="https://techhub.social/tags/google" class="mention hashtag" rel="nofollow noopener" target="_blank">#google</a> <a href="https://techhub.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#reinforcementlearning</a> <a href="https://techhub.social/tags/alphago" class="mention hashtag" rel="nofollow noopener" target="_blank">#alphago</a> <a href="https://techhub.social/tags/datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#datascience</a> <a href="https://techhub.social/tags/datascientist" class="mention hashtag" rel="nofollow noopener" target="_blank">#datascientist</a>

Sarah LeaWhich strategy do you use when learning something new?3 strategies AI agents use to learn what works: :blobcoffee: Greedy: Stick with what has worked best so far. :blobcoffee: ε-Greedy: Mostly stick with the best. But try something new every now and then. :blobcoffee: Optimistic Start: Assume everything is great until proven otherwise.They all come from something called the “Multi-Armed Bandit” problem.But they show up in real life too: → Trying a new café. → Deciding what to study → Choosing which project to pursue at work.Which one do you use most often? And should you change it?Curious to dive deeper? I covered both topics in my latest two articles: <a href="https://towardsdatascience.com/author/schuerch_sarah/" rel="nofollow noopener" translate="no" target="_blank">https://towardsdatascience.com/author/schuerch_sarah/</a><a href="https://techhub.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a> <a href="https://techhub.social/tags/KI" class="mention hashtag" rel="nofollow noopener" target="_blank">#KI</a> <a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#machinelearning</a> <a href="https://techhub.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#reinforcementlearning</a> <a href="https://techhub.social/tags/learning" class="mention hashtag" rel="nofollow noopener" target="_blank">#learning</a> <a href="https://techhub.social/tags/datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#datascience</a>

michabbb<a href="https://social.vivaldi.net/tags/ART" class="mention hashtag" rel="nofollow noopener" target="_blank">#ART</a> Agent Reinforcement Trainer: <a href="https://social.vivaldi.net/tags/Opensource" class="mention hashtag" rel="nofollow noopener" target="_blank">#Opensource</a> <a href="https://social.vivaldi.net/tags/RL" class="mention hashtag" rel="nofollow noopener" target="_blank">#RL</a> <a href="https://social.vivaldi.net/tags/framework" class="mention hashtag" rel="nofollow noopener" target="_blank">#framework</a> for building reliable <a href="https://social.vivaldi.net/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a> agents 🤖🎯 Improved email agent success rate from 74% to 94% using <a href="https://social.vivaldi.net/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#ReinforcementLearning</a> on <a href="https://social.vivaldi.net/tags/Qwen" class="mention hashtag" rel="nofollow noopener" target="_blank">#Qwen</a> model 💰 Reduced costs from $55 to $0.80 per 1,000 requests 🧵👇

Sarah LeaReinforcement Learning starts with a simple but powerful idea: Trial & Error. Learning what works.The Multi-Armed Bandit problem is a first step into this world. It's not just about slot machines. Iit's about how AI (and humans) learn to choose.<a href="https://techhub.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#ReinforcementLearning</a> <a href="https://techhub.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a> <a href="https://techhub.social/tags/CognitiveScience" class="mention hashtag" rel="nofollow noopener" target="_blank">#CognitiveScience</a> <a href="https://techhub.social/tags/Psychology" class="mention hashtag" rel="nofollow noopener" target="_blank">#Psychology</a> <a href="https://techhub.social/tags/Behavior" class="mention hashtag" rel="nofollow noopener" target="_blank">#Behavior</a> <a href="https://techhub.social/tags/DecisionMaking" class="mention hashtag" rel="nofollow noopener" target="_blank">#DecisionMaking</a> <a href="https://techhub.social/tags/Bandits" class="mention hashtag" rel="nofollow noopener" target="_blank">#Bandits</a> <a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#machinelearning</a> <a href="https://techhub.social/tags/KI" class="mention hashtag" rel="nofollow noopener" target="_blank">#KI</a> <a href="https://techhub.social/tags/Datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#Datascience</a> <a href="https://techhub.social/tags/datascientist" class="mention hashtag" rel="nofollow noopener" target="_blank">#datascientist</a>

Sarah LeaDo you always go to the same café? Or do you try something new?That’s the exploration vs. exploitation dilemma: Decision under uncertainty.Multi-armed bandits model exactly that.And this dilemma shows up everywhere: Recommender systems, A/B tests, online ads, even in human psychology.Nobel Prize winner Daniel Kahneman called this one of the most fundamental cognitive patterns.I explain what it is, why it matters, and how AI systems handle it. 🎰:blobcoffee: Full article here: <a href="https://towardsdatascience.com/simple-guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/" rel="nofollow noopener" translate="no" target="_blank">https://towardsdatascience.com/simple-guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/</a><a href="https://techhub.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#ReinforcementLearning</a> <a href="https://techhub.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a> <a href="https://techhub.social/tags/CognitiveScience" class="mention hashtag" rel="nofollow noopener" target="_blank">#CognitiveScience</a> <a href="https://techhub.social/tags/Kahneman" class="mention hashtag" rel="nofollow noopener" target="_blank">#Kahneman</a> <a href="https://techhub.social/tags/Psychology" class="mention hashtag" rel="nofollow noopener" target="_blank">#Psychology</a> <a href="https://techhub.social/tags/Behavior" class="mention hashtag" rel="nofollow noopener" target="_blank">#Behavior</a> <a href="https://techhub.social/tags/DecisionMaking" class="mention hashtag" rel="nofollow noopener" target="_blank">#DecisionMaking</a> <a href="https://techhub.social/tags/Bandits" class="mention hashtag" rel="nofollow noopener" target="_blank">#Bandits</a> <a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#machinelearning</a> <a href="https://techhub.social/tags/KI" class="mention hashtag" rel="nofollow noopener" target="_blank">#KI</a> <a href="https://techhub.social/tags/Datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#Datascience</a> <a href="https://techhub.social/tags/datascientist" class="mention hashtag" rel="nofollow noopener" target="_blank">#datascientist</a>

N-gated Hacker News🤓 Ah, yes, the classic "let's scale reinforcement learning algorithms to mind-boggling <a href="https://mastodon.social/tags/FLOPs" class="mention hashtag" rel="nofollow noopener" target="_blank">#FLOPs</a> and expect something magical" pitch. 🚀 Apparently, all it takes is sprinkling some next-token prediction dust on the entire Internet, and voilà! Genius-level <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a>, because clearly, the web is a treasure trove of high-quality reasoning. 🙄 <a href="https://blog.jxmo.io/p/how-to-scale-rl-to-1026-flops" rel="nofollow noopener" translate="no" target="_blank">https://blog.jxmo.io/p/how-to-scale-rl-to-1026-flops</a> <a href="https://mastodon.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#reinforcementlearning</a> <a href="https://mastodon.social/tags/magic" class="mention hashtag" rel="nofollow noopener" target="_blank">#magic</a> <a href="https://mastodon.social/tags/techinnovation" class="mention hashtag" rel="nofollow noopener" target="_blank">#techinnovation</a> <a href="https://mastodon.social/tags/mindbending" class="mention hashtag" rel="nofollow noopener" target="_blank">#mindbending</a> <a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#HackerNews</a> <a href="https://mastodon.social/tags/ngated" class="mention hashtag" rel="nofollow noopener" target="_blank">#ngated</a>

N-gated Hacker NewsIn a world where "easy" means "convoluted" and "any" means "almost nothing," comes the groundbreaking revelation that you can now slap a shiny label on your agent and call it Reinforcement Learning™️. 🤖✨ Just sprinkle some <a href="https://mastodon.social/tags/buzzwords" class="mention hashtag" rel="nofollow noopener" target="_blank">#buzzwords</a> and watch your productivity plummet! 🐌📉 <a href="https://openpipe.ai/blog/ruler" rel="nofollow noopener" translate="no" target="_blank">https://openpipe.ai/blog/ruler</a> <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#ReinforcementLearning</a> <a href="https://mastodon.social/tags/ProductivityTech" class="mention hashtag" rel="nofollow noopener" target="_blank">#ProductivityTech</a> <a href="https://mastodon.social/tags/ConvolutedSolutions" class="mention hashtag" rel="nofollow noopener" target="_blank">#ConvolutedSolutions</a> <a href="https://mastodon.social/tags/AITrends" class="mention hashtag" rel="nofollow noopener" target="_blank">#AITrends</a> <a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#HackerNews</a> <a href="https://mastodon.social/tags/ngated" class="mention hashtag" rel="nofollow noopener" target="_blank">#ngated</a>

Erik JonkerGood article how reinforcement learning improved current AI models. Also illustrates that LLMs today are not just imitating. <a href="https://arstechnica.com/ai/2025/07/how-a-big-shift-in-training-llms-led-to-a-capability-explosion/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social" rel="nofollow noopener" translate="no" target="_blank">https://arstechnica.com/ai/2025/07/how-a-big-shift-in-training-llms-led-to-a-capability-explosion/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social</a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a> <a href="https://mastodon.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#reinforcementlearning</a>

N-gated Hacker News🤖✨ Welcome to the parallel universe of "Understanding AI," where talking about reinforcement learning with "minimal math and jargon" actually means repeating the same phrase a dozen times until it loses all meaning. 🎓🔍 Because who needs <a href="https://mastodon.social/tags/clarity" class="mention hashtag" rel="nofollow noopener" target="_blank">#clarity</a> when you can drown your audience in <a href="https://mastodon.social/tags/buzzwords" class="mention hashtag" rel="nofollow noopener" target="_blank">#buzzwords</a> and call it a "deep dive"? 🌀📚 <a href="https://www.understandingai.org/p/reinforcement-learning-explained" rel="nofollow noopener" translate="no" target="_blank">https://www.understandingai.org/p/reinforcement-learning-explained</a> <a href="https://mastodon.social/tags/UnderstandingAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#UnderstandingAI</a> <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#ReinforcementLearning</a> <a href="https://mastodon.social/tags/DeepDive" class="mention hashtag" rel="nofollow noopener" target="_blank">#DeepDive</a> <a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#HackerNews</a> <a href="https://mastodon.social/tags/ngated" class="mention hashtag" rel="nofollow noopener" target="_blank">#ngated</a>

Assn for Computing Machinery"Intelligence is figuring out how the world works rather than waiting for someone to tell you how the world works."Join us as we hear from Andrew Barto and Richard Sutton, the 2024 <a href="https://mastodon.acm.org/tags/ACMTuringAward" class="mention hashtag" rel="nofollow noopener" target="_blank">#ACMTuringAward</a> recipients as they discuss their work on <a href="https://mastodon.acm.org/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#ReinforcementLearning</a>.<a href="https://vimeo.com/1085726612" rel="nofollow noopener" translate="no" target="_blank">https://vimeo.com/1085726612</a>

Python Weekly 🐍This Python class offers a multiprocessing-powered Pool for efficiently collecting and managing experience replay data in reinforcement learning.<a href="https://github.com/NoteDance/Pool" rel="nofollow noopener" translate="no" target="_blank">https://github.com/NoteDance/Pool</a>Discussions: <a href="https://discu.eu/q/https://github.com/NoteDance/Pool" rel="nofollow noopener" translate="no" target="_blank">https://discu.eu/q/https://github.com/NoteDance/Pool</a><a href="https://mastodon.social/tags/programming" class="mention hashtag" rel="nofollow noopener" target="_blank">#programming</a> <a href="https://mastodon.social/tags/python" class="mention hashtag" rel="nofollow noopener" target="_blank">#python</a> <a href="https://mastodon.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#reinforcementlearning</a>

Compsci Weekly[D] My first blog, PPO to GRPO<a href="https://medium.com/@opmyth/from-ppo-to-grpo-1681c837de5f" rel="nofollow noopener" translate="no" target="_blank">https://medium.com/@opmyth/from-ppo-to-grpo-1681c837de5f</a>Discussions: <a href="https://discu.eu/q/https://medium.com/%40opmyth/from-ppo-to-grpo-1681c837de5f" rel="nofollow noopener" translate="no" target="_blank">https://discu.eu/q/https://medium.com/%40opmyth/from-ppo-to-grpo-1681c837de5f</a><a href="https://mastodon.social/tags/compsci" class="mention hashtag" rel="nofollow noopener" target="_blank">#compsci</a> <a href="https://mastodon.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#machinelearning</a> <a href="https://mastodon.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#reinforcementlearning</a>

Dr. Carlotta A. Berry, PhD<a href="https://blacktwitter.io/tags/BlackInRobotics" class="mention hashtag" rel="nofollow noopener" target="_blank">#BlackInRobotics</a> workshop series <a href="https://blacktwitter.io/tags/ROS" class="mention hashtag" rel="nofollow noopener" target="_blank">#ROS</a> <a href="https://blacktwitter.io/tags/ROS2" class="mention hashtag" rel="nofollow noopener" target="_blank">#ROS2</a> <a href="https://blacktwitter.io/tags/Robot" class="mention hashtag" rel="nofollow noopener" target="_blank">#Robot</a> <a href="https://blacktwitter.io/tags/Robotics" class="mention hashtag" rel="nofollow noopener" target="_blank">#Robotics</a> <a href="https://blacktwitter.io/tags/STEM" class="mention hashtag" rel="nofollow noopener" target="_blank">#STEM</a> <a href="https://blacktwitter.io/tags/STEAM" class="mention hashtag" rel="nofollow noopener" target="_blank">#STEAM</a> <a href="https://blacktwitter.io/tags/BlackSTEM" class="mention hashtag" rel="nofollow noopener" target="_blank">#BlackSTEM</a> <a href="https://blacktwitter.io/tags/BlackSTEAM" class="mention hashtag" rel="nofollow noopener" target="_blank">#BlackSTEAM</a> <a href="https://blacktwitter.io/tags/Drone" class="mention hashtag" rel="nofollow noopener" target="_blank">#Drone</a> <a href="https://blacktwitter.io/tags/ComputerVision" class="mention hashtag" rel="nofollow noopener" target="_blank">#ComputerVision</a> <a href="https://blacktwitter.io/tags/Drones" class="mention hashtag" rel="nofollow noopener" target="_blank">#Drones</a> <a href="https://blacktwitter.io/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a> <a href="https://blacktwitter.io/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#MachineLearning</a> <a href="https://blacktwitter.io/tags/Neuralnetworks" class="mention hashtag" rel="nofollow noopener" target="_blank">#Neuralnetworks</a> <a href="https://blacktwitter.io/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#ReinforcementLearning</a> <a href="https://blacktwitter.io/tags/Learning" class="mention hashtag" rel="nofollow noopener" target="_blank">#Learning</a>

5h15h<a href="https://techhub.social/tags/Promptengineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#Promptengineering</a> is crucial for developing <a href="https://techhub.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#LLM</a>-based apps, but it's often manual & inefficient. PRewrite is an automated method using an LLM trained with <a href="https://techhub.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#reinforcementlearning</a> to optimize prompts <a href="https://arxiv.org/pdf/2401.08189" rel="nofollow noopener" translate="no" target="_blank">https://arxiv.org/pdf/2401.08189</a> <a href="https://techhub.social/tags/RL" class="mention hashtag" rel="nofollow noopener" target="_blank">#RL</a> <a href="https://techhub.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a>

Recent searches

Search options

Administered by:

Server stats:

#reinforcementlearning