AI Instruction Insider April 2025
AI Image generation gets a significant upgrade with GPT-4o – here’s why It matters for tech writers
What is the AI news?
OpenAI has rolled out a big update: it’s now using GPT-4o (short for "omni") for image generation in ChatGPT, replacing DALL·E 3. GPT-4o is a multimodal model, meaning it understands and creates text, audio, video, and images. Its image generation is smarter, sharper, and more responsive, thanks to reinforcement learning from human feedback (RLHF). Think: more accurate visuals, better text rendering in images, transparent backgrounds (hello, logo designers!), and easy editing of existing images.
Even better? This upgrade is available to everyone — Free, Plus, Team, and Pro users — although free-tier users might see a delay due to high demand.
Why it matters
We’re often tasked with creating or sourcing visuals — diagrams, mockups, UI examples, icons — for documentation. This update means we can now generate high-quality, custom images right inside ChatGPT without bouncing between tools. Transparent backgrounds? Perfect for integrating visuals into presentations or layered layouts. Need to iterate on an image after a manager’s feedback? GPT-4o supports multi-turn interactions, so you can tweak the image step by step.
Also worth noting: it can edit existing images. So if you’re working with a product screenshot and need to adjust labels or highlight a feature, you now have an AI helper for that.
How it helps you
Here are a few quick wins:
- Rapid visual prototyping: Describe the concept you need (e.g., “a flowchart showing user login process with icons for each step”) and get a visual in seconds.
- Brand-ready graphics: Transparent backgrounds make it easy to generate visuals that integrate with your organisation’s branding.
Try: “A minimalist gear icon with a transparent background, flat design, blue and grey tones.” - Visual iteration on the fly: Say goodbye to clunky editing tools. Just tell ChatGPT what to change.
Try: “Change the background to white and replace the database icon with a cloud.” - UI/UX mockups for documentation:
Prompt: “A mockup of a mobile app login screen with fields for email and password, and a ‘Sign In’ button.”
Creating IKEA-style line illustrations
We wanted to go the extra mile and test whether it's now possible to create line illustrations and even step-by-step installation instructions based on photos. In the first part of our experiment, we first asked GPT-4o to create a line illustration based on an image of a coffee machine.
In the second part of our experiment, we photographed the assembly of a microphone and asked an AI to turn the images into an IKEA-style user guide.
Line drawing
Prompt: Create an isometric line drawing of this coffee machine

Assembly instruction - Attempt #1
Prompt: Please generate an IKEA-style user manual (PDF).

Assembly instruction - Attempt #2
Prompt: Can you turn the attached 4 steps into an IKEA-style user manual? Add arrows to indicate movement and fastening of product elements.
The steps are:
- Place the microphone in the mount. Attach the microphone with the ring.
- Slide the pop filter onto the mount. Lock the pop filter by turning the thumb screw.

The result looks better, but it’s still not perfect.

As a result, we made some adjustments to the prompt with the illustrations below.

While the image generator shows promising results in creating IKEA-style instruction manuals, it's not perfect yet. In our test, the visuals captured the general idea but occasionally missed key details or depicted confusing steps — mainly when illustrating more nuanced assembly actions.
To improve clarity and output quality, we suggest providing more detailed descriptions and breaking the process into additional steps. Of course, this feature is still in its early stages, and we’re optimistic that future updates will bring sharper accuracy and smarter visual reasoning.
For now, the most effective approach is to use GPT-4o to generate isometric line drawings and then refine them manually in a tool such as Adobe InDesign.
Gemini 2.0 Flash now edits images via chat
What’s the news?
Google just rolled out a significant upgrade to its Gemini 2.0 Flash (Image Generation Experimental) model — its image generation capabilities have levelled up. For the first time, Gemini can now edit photos conversationally through chat. It also has a much sharper understanding of both pictures and video, making it more responsive to specific visual instructions. In testing, users can feed Gemini a photo and request precise edits, like converting a photo into a clean, line-art illustration in the style of a technical manual. Though it's not flawless yet and may take a few tries, it's a glimpse of how powerful this tool could become.
Why is it important?
This matters big time for technical writers who constantly need clear, consistent visuals—think product manuals, assembly guides, or diagrams. Traditionally, creating those hand-drawn, line-art illustrations involves either outsourcing to a graphic designer or spending hours tinkering with illustration software. With Gemini 2.0’s chat-based image editing, you could soon bypass that entirely. Imagine simply describing the visual style and details you need—and letting AI handle the heavy lifting.
We made the below image with the following prompt:
“Please convert the attached photo of the assembly of a microphone into a clean, line-art, hand-drawn style illustration, similar to the illustration shown in the reference image taken from another user manual.
Requirements:
- Style: Match the illustration style — simple, technical, black line-art without shading, suitable for inclusion in an instruction manual.
- Hand Position: Keep the exact hand position as in the photo.
- Details: Do not include any background elements such as the desk, drawers, or any surrounding objects—only illustrate the hand.
- Perspective: Maintain the same perspective and proportions.
- Output: Provide the result on a plain white background, high resolution, suitable for print manuals.”

How can it help you?
Here’s how you might put this to work:
- Rapid prototyping of visuals: Draft illustrations in minutes, not hours, by feeding in photos and describing the desired output style.
- Consistent style across documents: With repeatable prompts, you can generate visuals in a uniform style for different manuals, keeping branding cohesive.
- Fewer dependencies on designers: For straightforward visuals, you can generate high-res, print-ready images yourself, freeing up your design team for more complex projects.
- Iterative edits made easy: Instead of going back and forth with designers, you can tweak images interactively via chat, refining hand positions, perspective, or background removal until it’s just right.
While it still needs polishing (it took a few attempts to get a clean result in real-world use), the trajectory is clear — once it works seamlessly, this could save you countless hours and production costs.
Pro tip: Start experimenting now. The better you get at crafting specific prompts, the more mileage you’ll get out of these tools when they fully mature.
Google Gemini 2.5 Pro is here — and it’s a big deal for technical writers
What is the AI news?
Google just dropped Gemini 2.5 Pro, the latest upgrade to its flagship AI model, and it’s packed with new features. This version offers better reasoning, supports text, image, audio, and video inputs (aka multimodal), has stronger coding capabilities, and — get this — it’s now free for all users. Advanced subscribers still get perks like longer context windows and higher limits, but the core model is widely accessible.
Why it matters
If you’re writing docs, manuals, or help content, Gemini 2.5 Pro could seriously streamline your workflow. Better reasoning means it can now understand complex documentation requirements and generate more coherent and accurate content. Multimodal support opens the door to richer user manuals — think interactive guides with annotated images, or even audio-video walkthroughs. And its improved coding assistance means you can generate, explain, and document code snippets faster, with fewer errors.
How it helps you
- Better content generation: Stuck writing a tricky how-to section? Gemini 2.5 Pro can now better understand the structure and context of technical tasks, helping you draft step-by-step instructions that actually make sense to end users.
- Multimodal documentation: Want to level up your user guides? You can feed it screenshots or videos, and it can help you create text around them — like image captions, video descriptions, or interactive instructions.
- Code doc support: Need to write code samples or explain them in plain English? Gemini 2.5 Pro handles both like a pro. You can even use it to convert pseudocode into multiple programming languages, which is handy for multilingual or cross-platform documentation.
- Zero cost barrier: Since it’s now free to use (at least the core version), there’s no reason not to test it out in your current toolset — whether you’re drafting in Google Docs, Notion, or even Visual Studio Code.
OCR models spark debate
What’s the news?
Mistral AI just announced a new Optical Character Recognition (OCR) model, claiming it's the most accurate tool yet for extracting information from PDFs, graphs, and even handwritten notes. Given the OCR market is worth tens of billions of dollars, this is a big deal. But things heated up quickly —LlamaIndex responded with its own benchmarks, suggesting its workflows actually outperform Mistral's offering. So now there’s a lively back-and-forth over whose OCR system is truly the best, and more importantly, how we should measure that performance.
Why is it important for technical writers?
OCR is something technical writers brush up against constantly. Whether you're documenting legacy systems, converting scanned manuals, or organizing handwritten meeting notes, OCR plays a pivotal role. But here’s the kicker: Accuracy matters a lot. Misread characters can mean incorrect product specs, broken data tables, or mangled diagrams. This current debate underscores how tricky it is to truly evaluate OCR tools — especially when vendors cherry-pick benchmarks that favor their own products.
How can it help you as a technical writer?
For starters, this competition is driving better OCR tools, meaning fewer headaches when converting clunky PDFs or image-based docs. But it also gives you actionable takeaways:
- Benchmark scrutiny: Don’t blindly trust "industry-leading" claims. Both Mistral and LlamaIndex highlight the importance of how benchmarks are run. Before adopting a tool, dig into what kinds of documents they test on—are they similar to yours? PDFs with tables? Handwritten notes? This could influence which tool actually suits your workflow.
- Workflow improvements: LlamaIndex integrates OCR into broader AI workflows. You can explore chaining OCR with summarization, auto-tagging, or even generating structured documentation from raw PDFs. If you often deal with messy legacy docs, this might seriously streamline your process.
- Cost & ROI awareness: Both companies are in the game because of OCR’s revenue potential. That’s a reminder: Investing in better OCR (whether new software or API integrations) could save hours of manual work, making a strong business case to justify new tools in your tech stack.
In short, keep an eye on this space—not just for the tech drama, but because it might reshape how efficiently you extract, clean, and reuse information locked in tricky formats.
Anthropic’s new text editor tool: simplifying AI-powered file editing
What’s the news?
Anthropic has just released a text editor tool that makes working with AI in file editing much easier and more intuitive. It’s designed to streamline how you apply, review, and refine AI-generated edits directly within your files — helping you edit and iterate smoothly without needing to jump between tools or manually tweak every suggestion.
Why is it important?
As technical writers, we’re often neck-deep in documentation, code snippets, release notes, and structured content. Editing and refining these documents — especially large ones — can be time-consuming. Anthropic’s tool removes the back-and-forth hassle by allowing you to apply, review, and iterate AI-generated edits directly inside a file, without needing to copy-paste or manually tweak each suggestion. It makes the editing process faster, cleaner, and more controlled, which is key when accuracy is non-negotiable.
How can it help?
Here’s where it gets exciting:
- Effortless bulk edits: Quickly apply AI-driven changes across entire documents — whether that’s fixing terminology, reformatting content, or updating legacy information — and iterate on those changes until they’re just right.
- Seamless version control: Because the edits happen inside your files, you can track each change more quickly, making it simple to manage revisions in tools like Git.
- Customisable to your workflow: Whether you’re editing API docs, markdown files, or manuals, this tool is flexible enough to adapt to your preferred writing setup.
- Focus on strategy, Not repetition: You can offload tedious, repetitive edits to the AI and focus more on content strategy, accuracy, and user experience.
In short, Anthropic’s editor tool acts like an AI-savvy writing assistant embedded directly in your file workflow. If you’ve ever wished for a smarter, faster way to polish up technical docs without endless rounds of manual revision, this could be the solution. Keep an eye on this — integrating it into your toolkit might just shave hours off your editing process!
Otio helps summarise and converse with your documents and links, enabling you to write, edit, and paraphrase using AI
Otio is a new AI tool designed to help you summarise, edit, and paraphrase documents and links effortlessly. With its ability to interact with your content, Otio makes it easier to quickly extract key information, refine your writing, and rephrase for clarity or style.
Why it matters
As a technical writer, this tool could save time by helping you summarise long manuals, reports, or research papers. It’s perfect for editing content to improve readability, ensuring that technical documentation is clear, concise, and easy to follow.
How it helps you
- Quick content summarisation – Save time on lengthy documents by generating concise summaries.
- Effortless paraphrasing – Refine your technical content by rephrasing complex sections.
- Faster editing – Streamline the editing process, improving your productivity in content creation.
Foxconn launches FoxBrain, an AI model to optimise manufacturing and supply chain operations
Foxconn, famous for manufacturing the iPhone, has launched FoxBrain, a new AI large language model designed to optimise manufacturing and supply chain operations. This model aims to improve efficiency by analysing data in real time, predicting potential issues, and automating various processes across production lines. The goal is to reduce costs, enhance decision-making, and improve overall operational effectiveness within Foxconn’s vast manufacturing network.
For technical writers, this development underscores the growing role of AI in industries like manufacturing, where efficiency and precision are paramount. As more companies adopt AI to streamline operations, the demand for clear, concise documentation and user guides will rise. Understanding how AI models like FoxBrain work can help technical writers create more targeted materials that explain AI-driven processes to various audiences, including engineers, operations teams, and executives.
Adobe announced AI gizmos at their convention
Adobe has launched several AI-powered tools to streamline content creation and automation. The Adobe Experience Platform Agent Orchestrator helps AI agents work together across different ecosystems (AWS, IBM, Microsoft, SAP). GenStudio Foundation centralises content and analytics, while Experience Manager Sites Optimizer automates website issue detection. Additionally, Firefly Services now includes APIs for translation, lip sync, video formatting, and branded content generation.
Why it matters
With AI automating marketing and content workflows, documentation processes could shift toward managing AI-driven content strategies rather than manual content production.
How it helps
- Streamlined content tracking – GenStudio makes it easier to manage and update content.
- Faster issue detection – Experience Manager Sites Optimizer can help writers improve web-based docs.
Stable Virtual Camera is a release from Stability AI: a model for generating 3D video from 2D images
A new research preview has unveiled a model that enables the creation of realistic 3D videos from standard 2D images, eliminating the need for complex scene reconstruction. It uses multiview diffusion to transform conventional images into 3D volumetric videos with natural depth and perspective. The model supports dynamic camera movements, offering 14 unique trajectories, including 360°, Spiral, and Dolly Zoom. This creates cinematic effects and smooth motion transitions, even in long videos.
Why it matters
With the ability to turn basic images into 3D videos, this tool could change how you document visual content creation. As interactive media becomes more integrated into documentation, AI-powered 3D video generation could enhance tutorials, product demos, and technical guides by providing more dynamic, engaging visuals.
How it helps you
- Engage users with 3D visuals – Use 3D video generation for product demonstrations or complex concepts.
- Improve user guides – Incorporate smoother camera movements to enhance step-by-step tutorials or troubleshooting videos.
Faster visual content creation – Simplifies the process of generating 3D content without specialised hardware or complex software.
Nvidia’s Blackwell Ultra AI chip
Nvidia just unveiled the Blackwell Ultra AI chip at GTC, along with new GPU and CPU platforms designed to supercharge AI performance. This means faster, more efficient AI models across industries — including the tools technical writers use daily.
Why it matters
- AI writing assistants (like ChatGPT) will get smarter and faster.
- Translation and localisation will improve, making multilingual documentation easier.
- AI-driven search and content retrieval will be more efficient.
How it helps you
- Faster, higher-quality AI-generated drafts.
- Better AI-powered summarisation for complex docs.
- More innovative tools for organising and formatting content.
Bottom line: Expect better AI-powered writing tools soon!
Mistral and Cohere both have new models in the market — if you’re interested in local models
Both Mistral and Cohere just released new AI models, offering strong alternatives to OpenAI and Google — especially if you prefer local AI models. These models are designed for high performance while allowing more control over data and privacy. As technical writing often involves dealing with sensitive information, privacy is key.
Why it matters
- More options for AI-powered writing and research.
- Local models offer better privacy and customisation.
- Competition drives innovation, improving AI tools overall.
How it helps you
- Use AI assistance without relying on cloud-based models.
- Potential for faster, offline AI tools in documentation workflows.
- Greater flexibility in choosing an AI model that suits your needs.
Bottom line: More AI choices mean better tools for technical writers — especially for those prioritising local AI and data privacy!
Notion AI adds auto-tags and summaries
What is the AI news?
Notion AI just rolled out some seriously handy upgrades: it can now automatically summarize documents, suggest structure, and auto-tag pages based on content. The goal? Reduce friction in organizing your workspace and surface key info faster. It’s part of Notion’s bigger push to become not just a note-taking app, but an intelligent writing and knowledge management assistant.
Why it matters
If you've ever tried wrangling a growing knowledge base, you know how tedious it is to keep everything tagged, categorized, and summarized. Technical writers in fast-paced environments — where dozens of pages and specs are added weekly — spend a lot of time curating docs. With these new features, Notion AI is stepping in as your content assistant, doing the grunt work of labeling and condensing information automatically.
That’s a big deal for writers dealing with internal wikis, product docs, or onboarding flows that need to stay current and searchable. Less time tagging = more time writing (or breathing).
How it helps you
Think of these updates as your shortcut to a tidier, more searchable workspace:
- Summarize long docs: Create TL;DRs for internal specs, meeting notes, or onboarding manuals.
- Auto-tag for search: Let Notion suggest relevant tags to improve searchability without the mental effort.
- Structure suggestions: Got a rambling doc? Notion can now propose headings and sections to clean it up.
Use case: You’ve got a 10-page onboarding doc and someone new is joining tomorrow. Just ask: “Summarize this as a TL;DR for our internal wiki,” and let AI do the heavy lifting.
In short, Notion’s becoming more than a note-taker — it's becoming a second brain for your documentation system. And that’s something every technical writer should keep an eye on.
OpenAI's 'Operator' AI Assistant
What is the AI news?
OpenAI just dropped Operator — a semi-autonomous AI assistant designed to complete real-world tasks like booking tickets, ordering groceries, and navigating online services on your behalf. It’s currently rolling out to ChatGPT Pro users, and OpenAI is being careful with how far it goes: Operator is sandboxed to avoid sketchy sites or unsafe behavior, keeping the assistant reliable and secure.
Why it matters
This isn’t just another chatbot — it’s an AI that can actually do things for users. And that shift from passive assistance (like answering questions) to active execution (like placing orders) changes the game. For technical writers, it signals a future where documenting how an AI behaves, what permissions it needs, and how it interacts with systems becomes part of your regular scope.
Expect user guides and FAQs that sound less like “Click here to…” and more like “Here’s how to delegate tasks to your AI.” This is the kind of emerging tech where the documentation stakes are high: transparency, safety, and usability are everything.
How it helps you
Operator is still early, but here’s how it could impact your day-to-day:
- Get ahead on AI UX: The more you understand how tools like Operator work, the better positioned you’ll be to document them or even contribute to their UX copy and help systems.
- Write for delegation: Practice explaining how users can instruct an AI rather than navigate a UI. Think “How to phrase commands so Operator books the right flight,” not “Where to click to search flights.”
- Redefine support content: As AI agents get smarter, your troubleshooting docs will need to evolve — from debugging user errors to diagnosing AI missteps.
In short, Operator is a sign of what’s coming: AI that acts. As it moves from novelty to norm, the need for smart, thoughtful documentation around these tools is only going to grow.
GitHub copilot workspace expands early access
What is the AI news?
GitHub is opening up its Copilot Workspace to more users, moving it beyond limited preview. This tool lets you describe a task in plain English — like “Add a dark mode toggle” — and then auto-generates not just the code but also related documentation like inline comments and README sections. It’s like having an AI dev buddy who also writes doc drafts on the fly.
Why it matters
For technical writers, timing is everything. Too often, we’re stuck waiting for features to be finalised or for engineering to hand off the latest build. Copilot Workspace flips that script. Now you can see how a feature might be implemented before it lands, giving you a head start on API references, tutorials, or configuration guides.
And because it generates documentation alongside the code, you’re not starting from scratch. Even if the output needs refinement, it’s a huge time-saver — especially for smaller teams or solo writers juggling multiple projects.
How it helps you
Here’s what you can do with Copilot Workspace today:
- Prototype faster: Type in a prompt like “Enable export to CSV from the dashboard,” and Copilot will spin up the code and the usage notes.
- Jumpstart tutorials: Use the AI-generated README or inline docs as your base, then polish and expand into full-length guides.
- Bridge the dev-doc gap: Preview how a feature is built so you’re not writing in the dark or waiting on back-and-forth.
Try this: “Add OAuth login to the mobile app.” You’ll get sample code, explanatory comments, and a scaffolded README. From there, it's just a few edits to turn it into a release-ready doc.
Bottom line? Copilot Workspace isn’t just a dev tool — it’s a content springboard. If you’re part of a team that ships fast, this might be your new secret weapon.
Tencent’s open-source 3D generation tools
What is the AI news?
On March 18, 2025, Tencent released a set of open-source AI tools powered by Hunyuan3D-2.0, which can turn text and image inputs into high-quality 3D visuals in as little as 30 seconds. It’s part of Tencent’s broader effort to provide fast, affordable, and high-performance generative AI tools to creators and developers alike.
Why it matters
3D content creation has traditionally been slow, complex, and expensive — often requiring specialised design skills and software. Tencent’s tools simplify the process, signalling a shift toward faster, more automated asset generation. For technical writers, it’s a peek into a future where you may no longer need a designer to generate visuals for documentation. If you can describe it, you can render it.
This opens the door to more dynamic documentation — especially in areas like gaming, simulation, or hardware where 3D visuals can significantly enhance understanding. And because it’s open-source, the tools are accessible for experimentation without breaking your budget.
How it helps you
Tencent’s toolset could change your workflow in some very practical ways:
- Add 3D to your toolbox: Generate visuals for product components, device diagrams, or UI mockups — without touching a design program.
- Explain in 3D: Help users understand spatial relationships, product assemblies, or motion using auto-generated 3D visuals.
- Prototype fast: Need visual examples for a tutorial or training doc? Describe it in text and get a 3D model in seconds.
In short, Tencent’s 3D tools are making visual documentation faster, smarter, and a lot more accessible.
Roblox’s Cube 3D model
What is the AI news?
Roblox has launched Cube 3D, an open-source AI model trained on 3D object data that uses a token-based prediction system to generate 3D meshes. It’s optimised for efficiency and will soon support image input and integrations with existing AI creation platforms.
Why it matters
As immersive content and interactive environments become more common, being able to produce 3D objects quickly is a serious win. For technical writers, Cube 3D makes it easier to support workflows that require 3D elements — think VR training modules, game documentation, or architectural overviews.
It also reflects a broader trend: AI is shifting from text generation to object creation, and open-source access gives you a front-row seat to explore it.
How it helps you
Here’s how Cube 3D could slot into your documentation stack:
- Create once, use everywhere: Generate 3D assets for use in docs, demos, or interactive tutorials.
- Visualise workflows: Use 3D meshes to represent parts, layouts, or environments that are hard to describe in 2D.
- Experiment freely: Being open-source, it’s perfect for testing without needing enterprise tools or licenses.
Cube 3D puts mesh modelling within reach — even for non-designers. That’s a huge plus for anyone working on visual-first content. However, it's not directly relevant for most technical writers. If you're exploring 3D modelling options that align better with dev workflows, check out a more relevant tool like Zoo.dev — it might be a better fit for your needs.
Ferry Vermeulen
Founder of INSTRKTIV and keen to help users become experts in the use of a product, and thus to contribute to a positive user experience. Eager to help organisations to reduce their product liability. Just loves cooking, travel, and music--especially electronic. Follow Ferry on Linkedin.
You may also be interested in
-
29 October 2025
From tabs to thinking: Meet the AI-first browser that might replace your workflow tools
This AI-first browser lets me say “summarise this and send it to my team” — and it just happens....
-
15 October 2025
ChatGPT Agent: your new AI coworker (that actually does stuff)
Discover ChatGPT Agent - OpenAI’s next-level AI that acts, not just chats. It automates research, formatting, and documentation for tech writers....

