Skip to content

Bug: File format validation occurs after encoding when using auto force_encoding #206

@Skumring

Description

@Skumring

Bug: File format validation occurs after encoding when using auto force_encoding

Description

When force_encoding: :auto is configured, the file encoding process runs BEFORE file format validation. This causes the system to attempt encoding operations on invalid file formats (e.g., .txt files), leading to unnecessary processing overhead and incorrect error handling flow.


Reproduction Script

require 'bundler/inline'

gemfile(true) do
  source 'https://rubygems.org'

  gem 'rails', '~> 7.0.0'
  gem 'sqlite3', '~> 1.6.0'
  gem 'activeadmin', '~> 2.9'
  gem 'active_admin_import', '5.1.0'
  gem 'rchardet', '~> 1.8'
  gem 'rubyzip', '~> 2.3'
  gem 'activerecord-import', '~> 1.5'
end

require 'active_record'
require 'action_controller/railtie'
require 'active_admin'
require 'active_admin_import'
require 'tempfile'

ActiveRecord::Base.establish_connection(adapter: 'sqlite3', database: ':memory:')
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :authors, force: true do |t|
    t.string :name
    t.string :last_name
    t.timestamps
  end
end

class Author < ActiveRecord::Base
  validates_presence_of :name
end

# Mock ActionDispatch for testing
module ActionDispatch
  module Http
    class UploadedFile
    end
  end
end

puts "\n=== Testing Validation Order Issue ===\n"

# Create a test file with .txt extension (invalid format)
invalid_file = Tempfile.new(['test', '.txt'])
invalid_file.write("Name,Last name,Birthday\nJohn,Doe,1986-05-01")
invalid_file.rewind

# Mock uploaded file with text/plain content type
uploaded_file = Struct.new(:present?, :path, :tempfile, :content_type) do
  def is_a?(klass)
    klass == ActionDispatch::Http::UploadedFile
  end
end.new(true, invalid_file.path, invalid_file, 'text/plain')

# Create model with auto encoding detection
model = ActiveAdminImport::Model.new(force_encoding: :auto, file: uploaded_file)

puts "Force encoding enabled: #{model.force_encoding?}"
puts "File format: #{uploaded_file.content_type}"
puts "Expected: Should reject .txt file BEFORE attempting encoding\n\n"

# The bug: encode_file callback runs BEFORE format validation
# In the old code:
#   before_validation :encode_file
#   validate :correct_content_type
#
# This means encode_file processes the .txt file before we check if it's valid CSV!

# Track if encode_file was called
encode_called = false
model.define_singleton_method(:encode_file) do
  encode_called = true
  puts "❌ BUG: encode_file was called BEFORE validation!"
  super()
end

model.valid?
puts "\nModel valid: #{model.valid?}"
puts "Errors: #{model.errors.full_messages}"
puts "encode_file was called: #{encode_called}"

invalid_file.close
invalid_file.unlink

puts "\n✅ EXPECTED: Validation should reject the file BEFORE encode_file is called"

Steps to Reproduce (in real application)

  1. Configure import with auto encoding detection:
    ActiveAdmin.register Author do
      active_admin_import force_encoding: :auto
    end
  2. Visit /admin/authors/import
  3. Upload a .txt file (or any non-CSV format like .doc, .pdf, etc.)
  4. Observe that the system attempts to encode the file before validation

Expected Behavior

The system should reject the file with error: "You can import only valid csv file" BEFORE attempting any encoding operations.

The validation lifecycle should be:

  1. ✅ Validate file format (correct_content_type)
  2. ✅ If valid, then encode file (encode_file)

Actual Behavior

The encode_file method runs BEFORE format validation completes, causing:

  • Unnecessary processing overhead: Invalid files are processed unnecessarily
  • Potential encoding errors: Non-CSV files may cause encoding exceptions
  • Incorrect error handling flow: Encoding happens before we know the file is valid
  • Performance degradation: Large invalid files are read and processed

The incorrect lifecycle with the bug:

  1. ❌ Encode file (encode_file runs in before_validation)
  2. ❌ Validate file format (correct_content_type)

Root Cause

In model.rb:39, the callback order was incorrect:

before_validation :encode_file, if: ->(me) { me.force_encoding? && me.file.present? }
validate :correct_content_type

The encode_file callback is registered as a before_validation hook, which means it runs before any validations, including the correct_content_type validation that checks file format.


Environment

  • Ruby: 3.2.1
  • Rails: 7.0.0+ (also affects 5.2, 6.0, 6.1, 7.1)
  • ActiveAdmin: 2.9.0
  • active_admin_import: 5.1.0 (all versions prior to the fix are affected)
  • Database: SQLite3 1.6.0 (but affects all databases)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions