655 lines
20 KiB
Markdown
655 lines
20 KiB
Markdown
# V2 Deduplication Implementation Plan
|
|
|
|
## Problem Statement
|
|
|
|
Currently, ImportServiceV2 allows duplicate Person records and related entities when:
|
|
1. A ClientCase with the same `client_ref` already exists in the database
|
|
2. A Contract with the same `reference` already exists for the client
|
|
3. Person data is present in the import row
|
|
|
|
This causes data duplication because V2 doesn't check for existing entities before creating Person and related entities (addresses, phones, emails, activities).
|
|
|
|
## V1 Deduplication Strategy (Analysis)
|
|
|
|
### V1 Person Resolution Order (Lines 913-1015)
|
|
V1 follows this hierarchical lookup before creating a new Person:
|
|
|
|
1. **Contract Reference Lookup** (Lines 913-922)
|
|
- If contract.reference exists → Find existing Contract → Get ClientCase → Get Person
|
|
- Prevents creating new Person when Contract already exists
|
|
|
|
2. **Account Result Derivation** (Lines 924-936)
|
|
- If Account processing resolved/created a Contract → Get ClientCase → Get Person
|
|
|
|
3. **ClientCase.client_ref Lookup** (Lines 937-945)
|
|
- If client_ref exists → Find ClientCase by (client_id, client_ref) → Get Person
|
|
- Prevents creating new Person when ClientCase already exists
|
|
|
|
4. **Contact Values Lookup** (Lines 949-964)
|
|
- Check Email.value → Get Person
|
|
- Check PersonPhone.nu → Get Person
|
|
- Check PersonAddress.address → Get Person
|
|
|
|
5. **Person Identifiers Lookup** (Lines 1005-1007)
|
|
- Check tax_number, ssn, etc. via `findPersonIdByIdentifiers()`
|
|
|
|
6. **Create New Person** (Lines 1009-1011)
|
|
- Only if all above fail
|
|
|
|
### V1 Contract Deduplication (Lines 2158-2196)
|
|
|
|
**Early Contract Lookup** (Lines 2168-2180):
|
|
```php
|
|
// Try to find existing contract EARLY by (client_id, reference)
|
|
// across all cases to prevent duplicates
|
|
$existing = Contract::query()->withTrashed()
|
|
->join('client_cases', 'contracts.client_case_id', '=', 'client_cases.id')
|
|
->where('client_cases.client_id', $clientId)
|
|
->where('contracts.reference', $reference)
|
|
->select('contracts.*')
|
|
->first();
|
|
```
|
|
|
|
**ClientCase Reuse Logic** (Lines 2214-2228):
|
|
```php
|
|
// If we have a client and client_ref, try to reuse existing case
|
|
// to avoid creating extra persons
|
|
if ($clientId && $clientRef) {
|
|
$cc = ClientCase::where('client_id', $clientId)
|
|
->where('client_ref', $clientRef)
|
|
->first();
|
|
if ($cc) {
|
|
// Reuse this case
|
|
$clientCaseId = $cc->id;
|
|
// If case has no person yet, set it
|
|
if (!$cc->person_id) {
|
|
// Find or create person and attach
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Key V1 Design Principles
|
|
|
|
✅ **Resolution before Creation** - Always check for existing entities first
|
|
✅ **Chain Derivation** - Contract → ClientCase → Person (reuse existing chain)
|
|
✅ **Contact Deduplication** - Match by email/phone/address before creating
|
|
✅ **Client-Scoped Lookups** - All queries scoped to import.client_id
|
|
✅ **Minimal Person Creation** - Only create Person as last resort
|
|
|
|
## V2 Current Architecture Issues
|
|
|
|
### Problem Areas
|
|
|
|
1. **PersonHandler** (`app/Services/Import/Handlers/PersonHandler.php`)
|
|
- Currently only deduplicates by tax_number/ssn (Lines 38-58)
|
|
- Doesn't check if Person exists via Contract/ClientCase
|
|
- Processes independently without context awareness
|
|
|
|
2. **ClientCaseHandler** (`app/Services/Import/Handlers/ClientCaseHandler.php`)
|
|
- Correctly resolves by client_ref (Lines 16-27)
|
|
- But doesn't prevent PersonHandler from running afterwards
|
|
|
|
3. **ContractHandler** (`app/Services/Import/Handlers/ContractHandler.php`)
|
|
- Missing early resolution logic
|
|
- Doesn't derive Person from existing Contract chain
|
|
|
|
4. **Processing Order Issue**
|
|
- Current priority: Person(100) → ClientCase(95) → Contract(90)
|
|
- Person runs BEFORE we know if ClientCase/Contract exists
|
|
- Should be reversed: Contract → ClientCase → Person
|
|
|
|
## V2 Deduplication Plan
|
|
|
|
### Phase 1: Reverse Processing Order ✅
|
|
|
|
**Change entity priorities in database seeder:**
|
|
```php
|
|
// NEW ORDER (descending priority)
|
|
Contract: 100
|
|
ClientCase: 95
|
|
Person: 90
|
|
Email: 80
|
|
Address: 70
|
|
Phone: 60
|
|
Account: 50
|
|
Payment: 40
|
|
Activity: 30
|
|
```
|
|
|
|
**Rationale:** Process high-level entities first (Contract, ClientCase) so we can derive Person from existing chains.
|
|
|
|
### Phase 2: Early Resolution Service 🔧
|
|
|
|
**Create:** `app/Services/Import/EntityResolutionService.php`
|
|
|
|
This service will be called BEFORE handlers process entities:
|
|
|
|
```php
|
|
class EntityResolutionService
|
|
{
|
|
/**
|
|
* Resolve Person ID from import context (existing entities).
|
|
* Returns Person ID if found, null otherwise.
|
|
*/
|
|
public function resolvePersonFromContext(
|
|
Import $import,
|
|
array $mapped,
|
|
array $context
|
|
): ?int {
|
|
// 1. Check if Contract already processed
|
|
if ($contract = $context['contract']['entity'] ?? null) {
|
|
$personId = $this->getPersonFromContract($contract);
|
|
if ($personId) return $personId;
|
|
}
|
|
|
|
// 2. Check if ClientCase already processed
|
|
if ($clientCase = $context['client_case']['entity'] ?? null) {
|
|
if ($clientCase->person_id) {
|
|
return $clientCase->person_id;
|
|
}
|
|
}
|
|
|
|
// 3. Check for existing Contract by reference
|
|
if ($contractRef = $mapped['contract']['reference'] ?? null) {
|
|
$personId = $this->getPersonFromContractReference(
|
|
$import->client_id,
|
|
$contractRef
|
|
);
|
|
if ($personId) return $personId;
|
|
}
|
|
|
|
// 4. Check for existing ClientCase by client_ref
|
|
if ($clientRef = $mapped['client_case']['client_ref'] ?? null) {
|
|
$personId = $this->getPersonFromClientRef(
|
|
$import->client_id,
|
|
$clientRef
|
|
);
|
|
if ($personId) return $personId;
|
|
}
|
|
|
|
// 5. Check for existing Person by contact values
|
|
$personId = $this->resolvePersonByContacts($mapped);
|
|
if ($personId) return $personId;
|
|
|
|
return null; // No existing Person found
|
|
}
|
|
|
|
/**
|
|
* Check if ClientCase exists for this client_ref.
|
|
*/
|
|
public function clientCaseExists(int $clientId, string $clientRef): bool
|
|
{
|
|
return ClientCase::where('client_id', $clientId)
|
|
->where('client_ref', $clientRef)
|
|
->exists();
|
|
}
|
|
|
|
/**
|
|
* Check if Contract exists for this reference.
|
|
*/
|
|
public function contractExists(int $clientId, string $reference): bool
|
|
{
|
|
return Contract::query()
|
|
->join('client_cases', 'contracts.client_case_id', '=', 'client_cases.id')
|
|
->where('client_cases.client_id', $clientId)
|
|
->where('contracts.reference', $reference)
|
|
->exists();
|
|
}
|
|
|
|
private function getPersonFromContract(Contract $contract): ?int
|
|
{
|
|
if ($contract->client_case_id) {
|
|
return ClientCase::where('id', $contract->client_case_id)
|
|
->value('person_id');
|
|
}
|
|
return null;
|
|
}
|
|
|
|
private function getPersonFromContractReference(
|
|
?int $clientId,
|
|
string $reference
|
|
): ?int {
|
|
if (!$clientId) return null;
|
|
|
|
$clientCaseId = Contract::query()
|
|
->join('client_cases', 'contracts.client_case_id', '=', 'client_cases.id')
|
|
->where('client_cases.client_id', $clientId)
|
|
->where('contracts.reference', $reference)
|
|
->value('contracts.client_case_id');
|
|
|
|
if ($clientCaseId) {
|
|
return ClientCase::where('id', $clientCaseId)
|
|
->value('person_id');
|
|
}
|
|
|
|
return null;
|
|
}
|
|
|
|
private function getPersonFromClientRef(
|
|
?int $clientId,
|
|
string $clientRef
|
|
): ?int {
|
|
if (!$clientId) return null;
|
|
|
|
return ClientCase::where('client_id', $clientId)
|
|
->where('client_ref', $clientRef)
|
|
->value('person_id');
|
|
}
|
|
|
|
private function resolvePersonByContacts(array $mapped): ?int
|
|
{
|
|
// Check email
|
|
if ($email = $mapped['email']['value'] ?? $mapped['emails'][0]['value'] ?? null) {
|
|
$personId = Email::where('value', trim($email))->value('person_id');
|
|
if ($personId) return $personId;
|
|
}
|
|
|
|
// Check phone
|
|
if ($phone = $mapped['phone']['nu'] ?? $mapped['person_phones'][0]['nu'] ?? null) {
|
|
$personId = PersonPhone::where('nu', trim($phone))->value('person_id');
|
|
if ($personId) return $personId;
|
|
}
|
|
|
|
// Check address
|
|
if ($address = $mapped['address']['address'] ?? $mapped['person_addresses'][0]['address'] ?? null) {
|
|
$personId = PersonAddress::where('address', trim($address))->value('person_id');
|
|
if ($personId) return $personId;
|
|
}
|
|
|
|
return null;
|
|
}
|
|
}
|
|
```
|
|
|
|
### Phase 3: Update PersonHandler 🔧
|
|
|
|
**Modify:** `app/Services/Import/Handlers/PersonHandler.php`
|
|
|
|
Add resolution service check before creating:
|
|
|
|
```php
|
|
public function process(Import $import, array $mapped, array $raw, array $context = []): array
|
|
{
|
|
// FIRST: Check if Person already resolved from context
|
|
$resolutionService = app(EntityResolutionService::class);
|
|
$existingPersonId = $resolutionService->resolvePersonFromContext(
|
|
$import,
|
|
$mapped,
|
|
$context
|
|
);
|
|
|
|
if ($existingPersonId) {
|
|
$existing = Person::find($existingPersonId);
|
|
|
|
// Update if configured
|
|
$mode = $this->getOption('update_mode', 'update');
|
|
|
|
if ($mode === 'skip') {
|
|
return [
|
|
'action' => 'skipped',
|
|
'entity' => $existing,
|
|
'message' => 'Person already exists (found via Contract/ClientCase chain)',
|
|
];
|
|
}
|
|
|
|
// Update logic...
|
|
return [
|
|
'action' => 'updated',
|
|
'entity' => $existing,
|
|
'count' => 1,
|
|
];
|
|
}
|
|
|
|
// SECOND: Try existing deduplication (tax_number, ssn)
|
|
$existing = $this->resolve($mapped, $context);
|
|
|
|
if ($existing) {
|
|
// Update logic...
|
|
}
|
|
|
|
// THIRD: Check contacts deduplication
|
|
$personIdFromContacts = $resolutionService->resolvePersonByContacts($mapped);
|
|
if ($personIdFromContacts) {
|
|
$existing = Person::find($personIdFromContacts);
|
|
// Update logic...
|
|
}
|
|
|
|
// LAST: Create new Person only if all checks failed
|
|
$payload = $this->buildPayload($mapped);
|
|
$person = Person::create($payload);
|
|
|
|
return [
|
|
'action' => 'inserted',
|
|
'entity' => $person,
|
|
'count' => 1,
|
|
];
|
|
}
|
|
```
|
|
|
|
### Phase 4: Update ContractHandler 🔧
|
|
|
|
**Modify:** `app/Services/Import/Handlers/ContractHandler.php`
|
|
|
|
Add early Contract lookup and ClientCase reuse:
|
|
|
|
```php
|
|
public function process(Import $import, array $mapped, array $raw, array $context = []): array
|
|
{
|
|
$clientId = $import->client_id;
|
|
$reference = $mapped['reference'] ?? null;
|
|
|
|
if (!$clientId || !$reference) {
|
|
return [
|
|
'action' => 'invalid',
|
|
'errors' => ['Contract requires client_id and reference'],
|
|
];
|
|
}
|
|
|
|
// EARLY LOOKUP: Check if Contract exists across all cases
|
|
$existing = Contract::query()
|
|
->join('client_cases', 'contracts.client_case_id', '=', 'client_cases.id')
|
|
->where('client_cases.client_id', $clientId)
|
|
->where('contracts.reference', $reference)
|
|
->select('contracts.*')
|
|
->first();
|
|
|
|
if ($existing) {
|
|
// Contract exists - update or skip
|
|
$mode = $this->getOption('update_mode', 'update');
|
|
|
|
if ($mode === 'skip') {
|
|
return [
|
|
'action' => 'skipped',
|
|
'entity' => $existing,
|
|
'message' => 'Contract already exists',
|
|
];
|
|
}
|
|
|
|
// Update logic...
|
|
return [
|
|
'action' => 'updated',
|
|
'entity' => $existing,
|
|
'count' => 1,
|
|
];
|
|
}
|
|
|
|
// Creating new Contract - resolve/create ClientCase
|
|
$clientCaseId = $this->resolveOrCreateClientCase($import, $mapped, $context);
|
|
|
|
if (!$clientCaseId) {
|
|
return [
|
|
'action' => 'invalid',
|
|
'errors' => ['Unable to resolve client_case_id'],
|
|
];
|
|
}
|
|
|
|
// Create Contract
|
|
$payload = array_merge($this->buildPayload($mapped), [
|
|
'client_case_id' => $clientCaseId,
|
|
]);
|
|
|
|
$contract = Contract::create($payload);
|
|
|
|
return [
|
|
'action' => 'inserted',
|
|
'entity' => $contract,
|
|
'count' => 1,
|
|
];
|
|
}
|
|
|
|
protected function resolveOrCreateClientCase(
|
|
Import $import,
|
|
array $mapped,
|
|
array $context
|
|
): ?int {
|
|
$clientId = $import->client_id;
|
|
$clientRef = $mapped['client_ref'] ??
|
|
$context['client_case']['entity']?->client_ref ??
|
|
null;
|
|
|
|
// If ClientCase already processed in this row
|
|
if ($clientCaseId = $context['client_case']['entity']?->id ?? null) {
|
|
return $clientCaseId;
|
|
}
|
|
|
|
// Try to find existing ClientCase by client_ref
|
|
if ($clientRef) {
|
|
$existing = ClientCase::where('client_id', $clientId)
|
|
->where('client_ref', $clientRef)
|
|
->first();
|
|
|
|
if ($existing) {
|
|
// REUSE existing ClientCase (and its Person)
|
|
return $existing->id;
|
|
}
|
|
}
|
|
|
|
// Create new ClientCase (Person should already be processed)
|
|
$personId = $context['person']['entity']?->id ?? null;
|
|
|
|
if (!$personId) {
|
|
// Person wasn't in import, create minimal
|
|
$personId = Person::create(['type_id' => 1])->id;
|
|
}
|
|
|
|
$clientCase = ClientCase::create([
|
|
'client_id' => $clientId,
|
|
'person_id' => $personId,
|
|
'client_ref' => $clientRef,
|
|
]);
|
|
|
|
return $clientCase->id;
|
|
}
|
|
```
|
|
|
|
### Phase 5: Update ClientCaseHandler 🔧
|
|
|
|
**Modify:** `app/Services/Import/Handlers/ClientCaseHandler.php`
|
|
|
|
Ensure it uses resolved Person from context:
|
|
|
|
```php
|
|
public function process(Import $import, array $mapped, array $raw, array $context = []): array
|
|
{
|
|
$clientId = $import->client_id ?? null;
|
|
$clientRef = $mapped['client_ref'] ?? null;
|
|
|
|
// Get Person from context (should be processed first now)
|
|
$personId = $context['person']['entity']?->id ?? null;
|
|
|
|
if (!$clientId) {
|
|
return [
|
|
'action' => 'skipped',
|
|
'message' => 'ClientCase requires client_id',
|
|
];
|
|
}
|
|
|
|
$existing = $this->resolve($mapped, $context);
|
|
|
|
if ($existing) {
|
|
$mode = $this->getOption('update_mode', 'update');
|
|
|
|
if ($mode === 'skip') {
|
|
return [
|
|
'action' => 'skipped',
|
|
'entity' => $existing,
|
|
'message' => 'ClientCase already exists (skip mode)',
|
|
];
|
|
}
|
|
|
|
$payload = $this->buildPayload($mapped, $existing);
|
|
|
|
// Update person_id ONLY if provided and different
|
|
if ($personId && $existing->person_id !== $personId) {
|
|
$payload['person_id'] = $personId;
|
|
}
|
|
|
|
$appliedFields = $this->trackAppliedFields($existing, $payload);
|
|
$existing->update($payload);
|
|
|
|
return [
|
|
'action' => 'updated',
|
|
'entity' => $existing,
|
|
'count' => 1,
|
|
];
|
|
}
|
|
|
|
// Create new ClientCase
|
|
$payload = $this->buildPayload($mapped);
|
|
|
|
// Attach Person if resolved
|
|
if ($personId) {
|
|
$payload['person_id'] = $personId;
|
|
}
|
|
|
|
$payload['client_id'] = $clientId;
|
|
|
|
$clientCase = ClientCase::create($payload);
|
|
|
|
return [
|
|
'action' => 'inserted',
|
|
'entity' => $clientCase,
|
|
'count' => 1,
|
|
];
|
|
}
|
|
```
|
|
|
|
### Phase 6: Integration into ImportServiceV2 🔧
|
|
|
|
**Modify:** `app/Services/Import/ImportServiceV2.php`
|
|
|
|
Inject resolution service into processRow:
|
|
|
|
```php
|
|
protected function processRow(Import $import, array $mapped, array $raw, array $context): array
|
|
{
|
|
$entityResults = [];
|
|
$lastEntityType = null;
|
|
$lastEntityId = null;
|
|
$hasErrors = false;
|
|
|
|
// NEW: Add resolution service to context
|
|
$context['resolution_service'] = app(EntityResolutionService::class);
|
|
|
|
// Process entities in configured priority order
|
|
foreach ($this->entityConfigs as $root => $config) {
|
|
// ... existing logic ...
|
|
}
|
|
|
|
// ... rest of method ...
|
|
}
|
|
```
|
|
|
|
## Implementation Checklist
|
|
|
|
### Step 1: Update Database Priority ✅
|
|
- [ ] Modify `database/seeders/ImportEntitiesV2Seeder.php`
|
|
- [ ] Change priorities: Contract(100), ClientCase(95), Person(90)
|
|
- [ ] Run seeder: `php artisan db:seed --class=ImportEntitiesV2Seeder --force`
|
|
|
|
### Step 2: Create EntityResolutionService 🔧
|
|
- [ ] Create `app/Services/Import/EntityResolutionService.php`
|
|
- [ ] Implement all resolution methods
|
|
- [ ] Add comprehensive PHPDoc
|
|
- [ ] Add logging for debugging
|
|
|
|
### Step 3: Update PersonHandler 🔧
|
|
- [ ] Modify `process()` method to check resolution service first
|
|
- [ ] Add contact-based deduplication
|
|
- [ ] Ensure proper skip/update modes
|
|
|
|
### Step 4: Update ContractHandler 🔧
|
|
- [ ] Add early Contract lookup (client_id + reference)
|
|
- [ ] Implement ClientCase reuse logic
|
|
- [ ] Prevent duplicate Contract creation
|
|
|
|
### Step 5: Update ClientCaseHandler 🔧
|
|
- [ ] Use Person from context
|
|
- [ ] Handle person_id properly on updates
|
|
- [ ] Maintain existing deduplication
|
|
|
|
### Step 6: Integrate into ImportServiceV2 🔧
|
|
- [ ] Add resolution service to context
|
|
- [ ] Test with existing imports
|
|
|
|
### Step 7: Testing 🧪
|
|
- [ ] Test import with existing client_ref
|
|
- [ ] Test import with existing contract reference
|
|
- [ ] Test import with existing email/phone
|
|
- [ ] Test mixed scenarios
|
|
- [ ] Verify no duplicate Persons created
|
|
- [ ] Check all related entities linked correctly
|
|
|
|
## Expected Behavior After Implementation
|
|
|
|
### Scenario 1: Existing ClientCase by client_ref
|
|
```
|
|
Import Row: {client_ref: "B387055", name: "John", email: "john@test.com"}
|
|
|
|
Before V2 Fix:
|
|
❌ Creates new Person (duplicate)
|
|
❌ Creates new Email (duplicate)
|
|
✅ Reuses ClientCase
|
|
|
|
After V2 Fix:
|
|
✅ Finds existing Person via ClientCase
|
|
✅ Updates Person if needed
|
|
✅ Reuses ClientCase
|
|
✅ Reuses/updates Email
|
|
```
|
|
|
|
### Scenario 2: Existing Contract by reference
|
|
```
|
|
Import Row: {contract.reference: "REF-123", person.name: "Jane"}
|
|
|
|
Before V2 Fix:
|
|
❌ Creates new Person (duplicate)
|
|
❌ Contract might be created or updated
|
|
❌ New Person not linked to existing ClientCase
|
|
|
|
After V2 Fix:
|
|
✅ Finds existing Contract
|
|
✅ Derives Person from Contract → ClientCase chain
|
|
✅ Updates Person if needed
|
|
✅ No duplicate Person created
|
|
```
|
|
|
|
### Scenario 3: New Import (no existing entities)
|
|
```
|
|
Import Row: {client_ref: "NEW-001", name: "Bob"}
|
|
|
|
Behavior:
|
|
✅ Creates new Person
|
|
✅ Creates new ClientCase
|
|
✅ Links correctly
|
|
✅ No duplicates
|
|
```
|
|
|
|
## Success Criteria
|
|
|
|
✅ **No duplicate Persons** when client_ref or contract reference exists
|
|
✅ **Proper entity linking** - all entities connected to correct Person
|
|
✅ **Backward compatibility** - existing imports still work
|
|
✅ **Skip mode respected** - handlers honor skip/update modes
|
|
✅ **Contact deduplication** - matches by email/phone/address
|
|
✅ **Performance maintained** - no significant slowdown
|
|
|
|
## Rollback Plan
|
|
|
|
If issues occur:
|
|
1. Revert priority changes in database
|
|
2. Disable EntityResolutionService by commenting out context injection
|
|
3. Fall back to original handler behavior
|
|
4. Investigate and fix issues
|
|
5. Re-implement with fixes
|
|
|
|
## Notes
|
|
|
|
- This plan maintains V2's modular handler architecture
|
|
- Resolution logic is centralized in EntityResolutionService
|
|
- Handlers remain independent but context-aware
|
|
- Similar to V1 but cleaner separation of concerns
|
|
- Can be implemented incrementally (phase by phase)
|
|
- Each phase can be tested independently
|