This one left me scratching my head for a while. Not quite sure I would call it a bug, but this one definitely has a GOTCHA.
Let say you have a Users model and Units model. Each can have_and_belong_to_many of each other. But, you want to make sure there are no multiple User->Unit relationships, or Unit->User relationships. So you use the association scope -> { uniq } .
It won’t work. Here is why.
Models
So we have these models:
1 2 3 4 5 6 7 |
class Unit < ActiveRecord::Base has_and_belongs_to_many :users, -> { uniq } end class User < ActiveRecord::Base has_and_belongs_to_many :units, -> { uniq } end |
And the relationship table:
1 2 3 4 5 6 7 8 9 10 |
class CreateUnitsUsers < ActiveRecord::Migration def change create_table :units_users, id: false do |t| t.references :unit t.references :user end add_index :units_users, [:unit_id, :user_id] add_index :units_users, [:user_id, :unit_id] end end |
You can read more about ActiveRecord associations and queries at Rails Guides.
Queries
When you query a Unit for Users, you get only DISTINCT users, and vice-versa.
1 2 3 4 5 6 7 8 9 |
> Unit.find(8).users.count Unit Load (0.2ms) SELECT "units".* FROM "units" WHERE "units"."id" = ? LIMIT 1 [["id", 8]] (0.5ms) SELECT DISTINCT COUNT(DISTINCT "users"."id") FROM "users" INNER JOIN "units_users" ON "users"."id" = "units_users"."user_id" WHERE "units_users"."unit_id" = ? [["unit_id", 8]] => 130 > User.find_by_first_name_and_last_name('Karl', 'Smith').units.count User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."first_name" = 'Karl' AND "users"."last_name" = 'Smith' LIMIT 1 (0.3ms) SELECT DISTINCT COUNT(DISTINCT "units"."id") FROM "units" INNER JOIN "units_users" ON "units"."id" = "units_users"."unit_id" WHERE "units_users"."user_id" = ? [["user_id", 617]] => 1 |
But All is Not Well
Let’s check out what happens if we use a .select(:first_name) with the Unit.users query.
1 2 3 4 |
> Unit.find(8).users.select(:first_name).count Unit Load (0.1ms) SELECT "units".* FROM "units" WHERE "units"."id" = ? LIMIT 1 [["id", 8]] (0.4ms) SELECT DISTINCT COUNT(DISTINCT "users"."first_name") FROM "users" INNER JOIN "units_users" ON "users"."id" = "units_users"."user_id" WHERE "units_users"."unit_id" = ? [["unit_id", 8]] => 108 |
Whoa there… our count of users went from 130 (the correct number) to just 108. Why? Take a look at the SQL with the .select(:first_name) query. Specifically the SELECT DISTINCT COUNT(DISTINCT "users"."first_name") . The query that ActiveRecord is generating is doing a DISTINCT (alias for uniq) on the "users"."first_name" . This query will only find users with DISTINCT first names. What if you have 3 people with the first name of “John”. Well, it will only find and return 1 of those John’s.
This is not what I expected. POLA any one?
Documentation
Rails Guides has a good section on Selecting Specific Fields that goes into detail about using .select() with .unique . But it doesn’t mention that using the -> { uniq } scope on an ActiveRecord association will have the same affect. Well, it does.
So, it got me to thinking… could I add -> { distinct(:id) } to the HABTM declaration? Yes you can. But it doesn’t work with .select() . For some reason when you use .select() ActiveRecord will override any column declarations you used with the distinct() .
Proof eh?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
class User < ActiveRecord::Base has_and_belongs_to_many :units, -> { distinct(:id) } end class Unit < ActiveRecord::Base has_and_belongs_to_many :users, -> { distinct(:id) } end > Unit.find(8).users.count Unit Load (0.1ms) SELECT "units".* FROM "units" WHERE "units"."id" = ? LIMIT 1 [["id", 8]] (0.4ms) SELECT DISTINCT COUNT(DISTINCT "users"."id") FROM "users" INNER JOIN "units_users" ON "users"."id" = "units_users"."user_id" WHERE "units_users"."unit_id" = ? [["unit_id", 8]] => 130 > Unit.find(8).users.select(:first_name).count Unit Load (0.1ms) SELECT "units".* FROM "units" WHERE "units"."id" = ? LIMIT 1 [["id", 8]] (0.4ms) SELECT DISTINCT COUNT(DISTINCT "users"."first_name") FROM "users" INNER JOIN "units_users" ON "users"."id" = "units_users"."user_id" WHERE "units_users"."unit_id" = ? [["unit_id", 8]] => 108 |
I find it interesting that the first query, without specifying a .select() generates the correct DISTINCT query with the "users"."id" column. But as you can see on the second query, it is overriding the specified scope with the .select() columns.
This is my logic: if I use .select() that doesn’t necessarily mean I want DISTINCT columns, regardless if weather I specify a .distinct() or not. In the above case, .select() always includes the :id column wether or not you are also using .distinct() . So why would the .select() override my .distinct(:id) declaration?
Solution?
First solution is to not use .select() with a query. Duh.
But I really like .select() . It’s a great way to reduce memory size when instantiating hundreds or thousands of User ActiveRecord objects. And in my real-world use case, the User table has a couple of text columns that contain large amounts of html. So I like to use .select() where ever I can.
So I guess there really isn’t a solution that I know of, except not to use .select() with ActiveRecord queries on HABTM tables that use the -> { uniq } scope.
Bummer.
Best Solution!
You knew it was coming, didn’t cha. I wouldn’t leave you hanging.
The best solution is to not use the -> { uniq } scope with HABTM relations, but enforce the uniqueness condition at the database level with unique compound indexes.
In your migration for the relationship table just add unique: true to the index declaration.
1 2 3 4 5 6 7 8 9 10 |
class CreateUnitsUsers < ActiveRecord::Migration def change create_table :units_users, id: false do |t| t.references :unit t.references :user end add_index :units_users, [:unit_id, :user_id], unique: true add_index :units_users, [:user_id, :unit_id], unique: true end end |
You should really mention the database you are using here, makes a big difference in the sql that is outputted.
Good point. I was using Postgres for these examples. But Rails will automatically generated the correct SQL regardless of database you are using.