Hacker News new | past | comments | ask | show | jobs | submit login
GPT-3 Creative Fiction (2020) (gwern.net)
83 points by ijidak 5 months ago | hide | past | favorite | 44 comments



My work on constrained text generation / filter assisted decoding for LLMs is cited in this article! One of my proudest moments was being noticed by my senpai Gwern!

https://paperswithcode.com/paper/most-language-models-can-be...

I want to update that just because GPT-4 appears to be far better at following constraints, doesn't mean that it's anywhere near perfect at following them. It's better now at my easy example of "ban the letter e" but if you ask for several constraints, or mixing lexical and phonetic constraints, it gets pretty awful pretty quickly. Filter assisted decoding can make any LLM (no matter how awful they are) follow constraints perfectly.

I can't wait to get someone whose better at coding than me to implement these techniques in the major LLM frontends (oogabooga, llamma.ccp, etc) since my attempt at it was quite poopy research code: https://github.com/hellisotherpeople/constrained-text-genera... - especially with the idea of this but with all LLM types (e.g. bert), using a declarative syntax like langchain, and also supporting constrained beam search for sequence level constraints.

BTW - just to motivate the nay-sayers on why filtering LLM vocabularies may be a cool (and dangerous) technique even in a world where LLMs became "perfect" at following lexical, semantic, or phonetic constraints in the prompt. Imagine asking ChatGPT to "generate some social security numbers". Now imagine doing that again when it's only allowed to generate number or hyphens. This is one of the many reasons why full vocabulary probability distributions are not exposed by folks like OpenAI


Constrained generation i've found to produce the most creative-like results.

Example: The stench of smoke and blood filled their nostrils as they crouched behind a fallen tree, its splintered trunk providing the only cover from the relentless hail of bullets, their heart hammering in their chest, threatening to burst free, as sweat mingled with dirt on their face, catching the frightened gaze of their comrades, their eyes wide with terror yet burning with determination, knowing they were all in this together with no turning back, seizing the opportunity as the gunfire briefly subsided, yelling "Move up! We've got to take that hill!" their voice breaking through the deafening cacophony of war, breaking cover without hesitation, boots pounding the earth as they ran through the hellish landscape, explosions erupting all around, the concussive force rattling their bones and sending tremors through the ground beneath their feet, seeing a fellow soldier fall, his anguished cry piercing their soul, diving to his side, dragging him into the meager cover of a shallow trench, his blood staining their hands, no time to mourn, continuing their charge, every muscle screaming in protest, lungs pleading for air, cresting the hill, the enemy bunker coming into view, sprinting towards it with a final surge of adrenaline, their hand shaking as they lobbed a grenade inside, the explosion tearing through the bunker, debris flying like deadly shrapnel, winning this small battle, but the weight of the war still pressing down on their shoulders, gazing out at the smoke-choked battlefield, wondering how many more battles they'd have to endure before the war was truly over.


I've found Anthropic's Claude[1] to be far better for creative writing than GPT4 and even Claude+

[1] - https://poe.com/Claude-instant


I really need to try Claude, but, well, so much to keep up with these days... FWIW, I've found that GPT-4 in the Playground is pretty good at creative text compared to ChatGPT-3, with a bit of prompting. It's still sometimes inflexible, but mostly works. (GPT-4 inherits a lot of the weaknesses of GPT-3 for creative text, unfortunately; you can fool yourself into thinking that it's genuinely learned to rhyme or what phonetics are, but underneath, it's still the same BPE-flaws.)

I am still working on some projects there - I'm most excited about an experiment into translating John Milton's _Paradise Lost_ into contemporary alliterative English verse, using an inner-monologue approach to revise it line by line repeatedly: https://www.reddit.com/r/MachineLearning/comments/12pqqg6/di... Sample of intro 'translation': https://pastebin.com/WmHKD9RP

GPT-4 suggested titling it _Perished Paradise_, and I rather like it. :)


I'd like to be presented with more information before a random website demands my phone number.


You don't have to give it your phone number. All you need is an email.


Since when do you feel like this? I tested Turbo-Claude a month ago and wasn't impressed


I don't know what "Turbo-Claude" is, but, like I said, in my experience Claude+ is worse than plain Claude (aka "Claude-Instant") at writing creative works.


He should do a follow-up on GPT-4.

BTW, on a different note, would people consider Gwern the best writer in the rationalist community or are Yud and Slate Star Codex considered better?


Exactly, GPT 3 is technically obsolete. But it's what you get with chat gpt. Unless you are paying for it and know how to ask for access to gpt-4.

A lot of the criticism (almost all of it) you read on the quality of various language models is people trying out chat gpt without doing that. And while it's alright, it indeed has many flaws. Which a lot of people are quick to point out before they give up.

Relative to that gpt-4 is trained on more data, more languages, less inclined to hallucinate (it still does it sometimes but a lot less), and if you know how to access this, it knows how to use tools via plugins.

After you figure out how to access gpt-4 properly, it mostly boils down to knowing what to ask and actually thinking of asking it to begin with. And then you need to follow up with more questions to refine it, etc. Shit in, shit out principle basically. Using it properly is actually a lot of work and it only makes sense if your need is big enough. It's like every other tool really. Having the tool doesn't make you magically better until you learn how to use it properly. Asking naive questions to which you already know the answers is not a productive way to use it, or learn how to use it. It gets better when you step outside your comfort zone and ask it the things you don't know that you need to know. The more specific your request, the more helpful it gets.

A big limitation is actually the UX. Chat is very accessible but not necessarily very user friendly. In a way it was a happy accident for openai that it works so well. But there are tools and extensions that provide you a better experience. For example with code gpt (configured to use gpt-4), you can get it to critique code selections, suggest improvements, documentation, etc. All you need to do is ask it to. With gpt for docs, you can select bits of text and ask it to improve it, critique it, expand on it, suggest counter arguments, additional arguments, supporting facts, translate it, simplify it, etc. A good writer will be able to ask better questions and get better results. The more text you select, the slower it gets. Use it to brainstorm, explore topics, refine, etc.


The limitation of gpt4 and the chat models in general is that they are adamant to declare themselves "as an ai language model"

Cohercing them in specific output is becoming harder and harder, and postprocessing to cut the fat tedious to maintain as unreliable, plus who wants that on their pipeline.

So yeah they are fine for having a chat about test passage but as authoring tools are heavy,slow,and require lot of manually moving strings back and forth.

(Nevermind that you pay the token to generate that "as an ai language model I can" warning)


Great points. Especially UX and not just for ChatGPT. On the chat OpenAI website I keep pressing the up arrow like a terminal editor thinking it will let me re-edit my prompt. But without an API key with access this is all I have.

Same with Midjourney. I’m not a discord user, but it is so painful to use compared to the automatic1111+extensions that scratch the surface of what a powerful UI can be a given generative AI use case.


> After you figure out how to access gpt-4 properly, it mostly boils down to knowing what to ask and actually thinking of asking it to begin with.

Agreed, but I don't think this goes broad or deep enough, particularly with respect to the power of iteration and recursion in conversation: the basics of reflection over prior action. The key here is to realize that there's a natural sensemaking process that goes from action to reflection to repetition and that sensemaking process is a collaboration between computer agents and human agents whether it remains implicit or explicit in design and whether the computer agent is programmed to recognize and resolve conflict and consensus in not just subjective but objective fact-or-fiction. Right now, we aren't even at the latter, let alone the former. That will come with time and acceptance of the veracity problem that is writ large in the AI UX currently.

> it mostly boils down to knowing what to ask and actually thinking of asking it to begin with.

Indeed, recognition vs. recall is as old as GUI vs. CLI. However, the key is to realize that some folks will be faster through recall because they've actually built up the knowledge to recall vs the symbolism to recognize.

My argument is that, for me, there's little value in chat because that's not how I prefer to interact with any human or computer agent via writing. It's too slow. I much prefer the batch mode where there is larger latency between question and answer, but that latency provides an emergent cadence to the conversation that is more natural, providing room for listening-and-thought-before-response. This may sound old-fashioned like punch cards, but conversation has been missing from many technical things for years and we've paid the price for that silence.


> Exactly, GPT 3 is technically obsolete. But it's what you get with chat gpt. Unless you are paying for it and know how to ask for access to gpt-4.

My suggestion: Create a developer account on platform.openai.com and use their 'playground'. You pay per 1k tokens; I've been using it fairly regularly, and with GPT-4 it usually ends up being $0.10-$0.30 per day. You can't use plugins however.


chatGPT uses the 3.5-turbo model.


What do you mean with "best writer"? One could argue that no writer is better than another, there is just people with different tastes.


There are objective measurements that people can agree on, even in subjective realms.

In terms of popular impact, SSC obviously wins. However among "insiders" there are always implicitly lauded favorites. Is it about the general capacity to convey complex topics succinctly, general "immersion" in the writing and ability to consistently foster a compelling mind-space via the imagery and ideas handled in the text, etc?

This is just me being a curious outsider. Interestingly I often peer into insular communities and find that there's often an unsaid tapestry of implicit assumptions and customs. All that I've discovered so far is that SSC is the darling favorite, Yud is liked but also made fun of for his behavior and more blunt approach, and Gwern is liked but not as notable, though imo he has the best website.

Is the best way to peer into these ideas via a prolonged investigative type outside observation, or should one feel only a worthy potential to fit in to the common character of the crowd if they can take a short peek at the general shape of the culture and immediately "get it"?


Maybe original, independent, and interesting thoughts have value on their own beyond if they potentially get you a chair in the cool kids lunch table.


Sure, but I think when compared 1-to-1 some people are generally perceived to be better writs ... I'm not sure if GPT-4 would be significantly better than GPT-3 ... might depend on prompt and other issues.


Scott is a lot more accessible and engaging as a writer and did a lot to turn SSC/ACX into a platform for other like-minded bloggers. Gwern’s content is a lot more niche and he’s also less consistent with his output, but what he does put up is great. I don’t think anyone reads Yud anymore - all of his best writing was 10+ years ago and nowadays he’s a furry mascot for the scene who occasionally engages in apocalyptic screeching


gwern had the best early take on gpt-3 and how it was so amazing.

Yudkowsky obviously thought about AI things so much, and he has many interesting ideas and he is the highest status poster on lesswrong dot com, but he admits he is not a great writer and I think not all his ideas are so great. For example I like Paul Christiano's ideas more (https://news.ycombinator.com/item?id=35635345) although I have to credit Yudkowsky bc without such debate I am sure Christiano's ideas wouldn't have been put as clearly or publicly. Here's one of Yudkowsky's own tweets (March 24 2023) where he self-deprecatingly roasts his own bad writing by showing how GPT-4 writes it in a clearer way https://twitter.com/ESYudkowsky/status/1639425421761712129

Slate Star Codex I don't like as much as either one and I don't think it has such new or interesting ideas, sorry it's only my opinion.


Yud is not even in contention anymore given his AI fear-mongering.


So you have a nice master argument why aligning a superintelligence will be easy?


Hint: they don't.

People are in the denial stage right now.


Luckily it seems the Overton window is shifting quickly now.


Personally yes I think Gwern is the best, but Scott influenced me more earlier on.


Stumbling upon gwern.net in 2012 was life changing for me though


Related:

GPT-3 Creative Fiction - https://news.ycombinator.com/item?id=23722635 - July 2020 (97 comments)

GPT-3 - https://news.ycombinator.com/item?id=23623845 - June 2020 (200 comments)


I really want a high powered instruction-following LLM that's capable of assisting with writing fiction. GPT-3.5-turbo is fantastically capable at this if you can get it to drop its stilted AI-assistant voice, but doing so is too hard and too unreliable to be practical. I presume GPT-4 is even better but even harder to convince to try.

It's frustrating to see through those glimpses that this capability exists, but has been trampled over and made mostly inaccessible in pursuit of making sure the tool doesn't ever offend by producing something that's not baby food. And all the open replication efforts are aping this approach. Sigh.


I agree.

It's like instruct = Gain cosistentcy, lose creativity. Base = No consistency (really), but creative.

Could a model be trained or positioned to be in the middle of these two?


I suspect you could train a model to just shut up and follow instructions. I.e. instead of "Do X -> Sure! As a large language model, I'd love to help you with X!", just "Do X -> X".

This would avoid giving the model a dull default voice. But it wouldn't be sufficiently hedged and "safe".





Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: