List of handy Ruby scripts to transcode different file types (often by using ...

It's 2024 and we have tools like ffmpeg, imagemagick and GPT readily available. With them, it's easy to convert texts, images, audio and video clips into each other.

For the everyday use without any parameter tweaking I'm using a collection of tiny scripts in my ~/bin folder that can then be used as bash functions. And: It's faster to use the CLI than interacting with a website and cheaper to use the API than buying GPT plus.. :-)

Usage

text-to-image "parmiggiano cheese wedding cake, digital art"
text-to-audio "Yesterday I ate some tasty parmiggiano cheese at a wedding. It was the cake!" cake.mp3
audio-to-text /path/to/cake.mp3
image-to-text /path/to/cake.jpg
video-to-text /path/to/cake.mp4
video-to-video /path/to/rickroll.mov rickroll.mp4
video-to-audio /path/to/cake.mp4 cake.mp3
audio-to-audio /path/to/cake.mp3 cake.aac
image-to-image /path/to/cake.png cake.jpg

stateDiagram-v2
    text --> image: Dall-E 3
    text --> audio: GPT TTS
    image --> text: GPT Vision
    audio --> audio: ffmpeg
    audio --> text: GPT STT
    video --> text: GPT STT
    video --> audio: ffmpeg
    video --> video: ffmpeg
    image --> image: imagemagick

Prerequisites

~/bin should be part of your $PATH
The ENV key $OPENAI_API_KEY must be populated with a valid and charged API key Show archive.org snapshot
The ruby version for the script must run gem install ruby-openai once
For video-to-X you need a ffmpeg binary in your $PATH variable
For image-to-X you need a convert binary (imagemagick) in your $PATH variable
The files below must be executable (chmod +x)

Scripts

Note

All GPT commands below cost money. Not much though, most of the time less than one cent!

`~/bin/text-to-image`

#!/usr/bin/env ruby

require 'openai'
prompt = ARGV[0]

if prompt.to_s.strip == ''
  puts 'Usage: generate-image "parmiggiano cheese wedding cake, digital art"'
  exit
end

client = OpenAI::Client.new(access_token: ENV.fetch('OPENAI_API_KEY'))
puts client.images.generate(parameters: { prompt: prompt, model: 'dall-e-3', size: '1024x1024' }).dig("data", 0, "url")

`~/bin/text-to-audio`

#!/usr/bin/env ruby

require 'openai'
prompt = ARGV[0]
output_path = ARGV[1] || 'output.mp3'

if prompt.to_s.strip == ''
  puts 'Usage: text-to-audio "Yesterday I ate some tasty parmiggiano cheese at a wedding. It was the cake!" cake.mp3'
  exit
end

client = OpenAI::Client.new(access_token: ENV.fetch('OPENAI_API_KEY'))
response = client.audio.speech(parameters: { input: prompt, model: 'tts-1', voice: 'alloy' })
File.binwrite(output_path, response)

puts "You can find the TTS result at #{output_path}"

`~/bin/audio-to-text`

#!/usr/bin/env ruby

require 'openai'
audio_path = ARGV[0]

if audio_path.to_s.strip == ''
  puts 'Usage: audio-to-text /path/to/techno.mp3'
  exit
end

client = OpenAI::Client.new(access_token: ENV.fetch('OPENAI_API_KEY'))
puts client.audio.transcribe(parameters: { model: 'whisper-1', file: File.open(audio_path, 'r') }).dig("text")

`~/bin/image-to-text`

#!/usr/bin/env ruby

require 'openai'
require 'base64'
image_path = ARGV[0]

if image_path.to_s.strip == ''
  puts 'Usage: image-to-text /path/to/cake.jpg'
  exit
end

base64_image = Base64.encode64(File.read(image_path))
client = OpenAI::Client.new(access_token: ENV.fetch('OPENAI_API_KEY'))
puts client.chat(parameters: {
  model: 'gpt-4-vision-preview',
  messages: [{
    role: 'user',
    content: [
     { "type": "text", "text": "What’s in this image?"},
     { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,#{base64_image}" }},
    ]
  }]
}).dig("choices", 0, "message", "content")

`~/bin/video-to-text`

GPT is also able to transcode videos, so this is just an alias for audio-to-text.

But: It's probably cheaper and faster to use video-to-audio and then pipe the result to audio-to-text.

ln -s ~/bin/audio-to-text ~/bin/video-to-text

`~/bin/video-to-video`

#!/usr/bin/env ruby

input_path = ARGV[0]
output_filename = ARGV[1]

if input_path.to_s.strip == '' || output_filename.to_s.strip == ''
  puts 'Usage: video-to-video /path/to/rickroll.mov rickroll.mp4'
  exit
end

`ffmpeg -i #{input_path} #{output_filename}`

puts "File transcoded to #{output_filename}"

`~/bin/video-to-audio`

#!/usr/bin/env ruby

video_path = ARGV[0]
output_filename = ARGV[1] || 'output.mp3'

if video_path.to_s.strip == ''
  puts 'Usage: video-to-audio /path/to/rickroll.mp4 rickroll.mp3'
  exit
end

`ffmpeg -i #{video_path} -vn -acodec libmp3lame -q:a 4 #{output_filename}`

puts "File transcoded to #{output_filename}"

`~/bin/audio-to-audio`

#!/usr/bin/env ruby

input_path = ARGV[0]
output_filename = ARGV[1]

if input_path.to_s.strip == '' || output_filename.to_s.strip == ''
  puts 'Usage: audio-to-audio /path/to/rickroll.mp3 rickroll.aac'
  exit
end

`ffmpeg -i #{input_path} -vn -q:a 4 #{output_filename}`

puts "File transcoded to #{output_filename}"

`~/bin/image-to-image`

#!/usr/bin/env ruby

input_path = ARGV[0]
output_filename = ARGV[1]

if input_path.to_s.strip == '' || output_filename.to_s.strip == ''
  puts 'Usage: image-to-image /path/to/cake.png cake.jpg'
  exit
end

`convert #{input_path} #{output_filename}`

puts "File transcoded to #{output_filename}"

parmiggiano.png

Michael Leimstädtner

Say thanks4

Last edit

2024-03-15

Michael Leimstädtner

Attachments

parmiggiano.png

Keywords

chatgpt

License

Source code in this card is licensed under the MIT License.

Posted by Michael Leimstädtner to makandra dev (2024-03-14 16:06)

List of handy Ruby scripts to transcode different file types (often by using GPT)

Usage

Prerequisites

Scripts

Note

~/bin/text-to-image

~/bin/text-to-audio

~/bin/audio-to-text

~/bin/image-to-text

~/bin/video-to-text

~/bin/video-to-video

~/bin/video-to-audio

~/bin/audio-to-audio

~/bin/image-to-image

`~/bin/text-to-image`

`~/bin/text-to-audio`

`~/bin/audio-to-text`

`~/bin/image-to-text`

`~/bin/video-to-text`

`~/bin/video-to-video`

`~/bin/video-to-audio`

`~/bin/audio-to-audio`

`~/bin/image-to-image`