Rails HABTM with Unique Scope and Select Columns

This one left me scratching my head for a while. Not quite sure I would call it a bug, but this one definitely has a GOTCHA.

Let say you have a Users model and Units model. Each can have_and_belong_to_many of each other. But, you want to make sure there are no multiple User->Unit relationships, or Unit->User relationships. So you use the association scope -> { uniq } .

It won’t work. Here is why.

Models

So we have these models:

class Unit < ActiveRecord::Base

has_and_belongs_to_many :users, -> { uniq }

end

class User < ActiveRecord::Base

has_and_belongs_to_many :units, -> { uniq }

end

And the relationship table:

class CreateUnitsUsers < ActiveRecord::Migration

def change

create_table :units_users, id: false do |t|

t.references :unit

t.references :user

end

add_index :units_users, [:unit_id, :user_id]

add_index :units_users, [:user_id, :unit_id]

end

You can read more about ActiveRecord associations and queries at Rails Guides.

Queries

When you query a Unit for Users, you get only DISTINCT users, and vice-versa.

> Unit.find(8).users.count

Unit Load (0.2ms) SELECT "units".* FROM "units" WHERE "units"."id" = ? LIMIT 1 [["id", 8]]

(0.5ms) SELECT DISTINCT COUNT(DISTINCT "users"."id") FROM "users" INNER JOIN "units_users" ON "users"."id" = "units_users"."user_id" WHERE "units_users"."unit_id" = ? [["unit_id", 8]]

=> 130

> User.find_by_first_name_and_last_name('Karl', 'Smith').units.count

User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = 'Karl' AND "users"."last_name" = 'Smith' LIMIT 1

(0.3ms) SELECT DISTINCT COUNT(DISTINCT "units"."id") FROM "units" INNER JOIN "units_users" ON "units"."id" = "units_users"."unit_id" WHERE "units_users"."user_id" = ? [["user_id", 617]]

=> 1

But All is Not Well

Let’s check out what happens if we use a .select(:first_name) with the Unit.users query.

> Unit.find(8).users.select(:first_name).count

Unit Load (0.1ms) SELECT "units".* FROM "units" WHERE "units"."id" = ? LIMIT 1 [["id", 8]]

(0.4ms) SELECT DISTINCT COUNT(DISTINCT "users"."first_name") FROM "users" INNER JOIN "units_users" ON "users"."id" = "units_users"."user_id" WHERE "units_users"."unit_id" = ? [["unit_id", 8]]

=> 108

Whoa there… our count of users went from 130 (the correct number) to just 108. Why? Take a look at the SQL with the .select(:first_name) query. Specifically the SELECT DISTINCT COUNT(DISTINCT "users"."first_name") . The query that ActiveRecord is generating is doing a DISTINCT (alias for uniq) on the "users"."first_name" . This query will only find users with DISTINCT first names. What if you have 3 people with the first name of “John”. Well, it will only find and return 1 of those John’s.

This is not what I expected. POLA any one?

Documentation

Rails Guides has a good section on Selecting Specific Fields that goes into detail about using .select() with .unique . But it doesn’t mention that using the -> { uniq } scope on an ActiveRecord association will have the same affect. Well, it does.

So, it got me to thinking… could I add -> { distinct(:id) } to the HABTM declaration? Yes you can. But it doesn’t work with .select() . For some reason when you use .select() ActiveRecord will override any column declarations you used with the distinct() .

Proof eh?

class User < ActiveRecord::Base

has_and_belongs_to_many :units, -> { distinct(:id) }

end

class Unit < ActiveRecord::Base

has_and_belongs_to_many :users, -> { distinct(:id) }

end

> Unit.find(8).users.count

Unit Load (0.1ms) SELECT "units".* FROM "units" WHERE "units"."id" = ? LIMIT 1 [["id", 8]]

(0.4ms) SELECT DISTINCT COUNT(DISTINCT "users"."id") FROM "users" INNER JOIN "units_users" ON "users"."id" = "units_users"."user_id" WHERE "units_users"."unit_id" = ? [["unit_id", 8]]

=> 130

> Unit.find(8).users.select(:first_name).count

Unit Load (0.1ms) SELECT "units".* FROM "units" WHERE "units"."id" = ? LIMIT 1 [["id", 8]]

(0.4ms) SELECT DISTINCT COUNT(DISTINCT "users"."first_name") FROM "users" INNER JOIN "units_users" ON "users"."id" = "units_users"."user_id" WHERE "units_users"."unit_id" = ? [["unit_id", 8]]

=> 108

I find it interesting that the first query, without specifying a .select() generates the correct DISTINCT query with the "users"."id" column. But as you can see on the second query, it is overriding the specified scope with the .select() columns.

This is my logic: if I use .select() that doesn’t necessarily mean I want DISTINCT columns, regardless if weather I specify a .distinct() or not. In the above case, .select() always includes the :id column wether or not you are also using .distinct() . So why would the .select() override my .distinct(:id) declaration?

Solution?

First solution is to not use .select() with a query. Duh.

But I really like .select() . It’s a great way to reduce memory size when instantiating hundreds or thousands of User ActiveRecord objects. And in my real-world use case, the User table has a couple of text columns that contain large amounts of html. So I like to use .select() where ever I can.

So I guess there really isn’t a solution that I know of, except not to use .select() with ActiveRecord queries on HABTM tables that use the -> { uniq } scope.

Bummer.

Best Solution!

You knew it was coming, didn’t cha. I wouldn’t leave you hanging.

The best solution is to not use the -> { uniq } scope with HABTM relations, but enforce the uniqueness condition at the database level with unique compound indexes.

In your migration for the relationship table just add unique: true to the index declaration.

class CreateUnitsUsers < ActiveRecord::Migration

def change

create_table :units_users, id: false do |t|

t.references :unit

t.references :user

end

add_index :units_users, [:unit_id, :user_id], unique: true

add_index :units_users, [:user_id, :unit_id], unique: true

end

2 Responses

Kyle August 4, 2014 at 2:59 pm
You should really mention the database you are using here, makes a big difference in the sql that is outputted.
- Karl August 5, 2014 at 8:41 am
  Good point. I was using Postgres for these examples. But Rails will automatically generated the correct SQL regardless of database you are using.

Rails HABTM with Unique Scope and Select Columns

Models

Queries

But All is Not Well

Documentation

Solution?

Best Solution!

Related

2 Responses