Bulk load data conversion error (truncation) for concept_synonym_name #1047

stpatnoe · 2024-09-05T18:41:51Z

While running the SQL server BULK INSERT script for the CONCEPT_SYNONYM table for the latest vocabulary (v5.0 30-AUG-24), I'm getting the following error message for multiple rows:

Msg 4863, Level 16, State 1, Line 85
Bulk load data conversion error (truncation) for row 420061, column 2 (concept_synonym_name).

The row referenced in the error message above is for concept_id 1251539. It looks like the concept_synonym_name field was cut at 1000 characters but then continues on a second line?

The bulk insert script I'm using is the same format as the one provided below from this link.

TRUNCATE TABLE CONCEPT_SYNONYM;
BULK INSERT CONCEPT_SYNONYM
FROM 'C:\CDMV5VOCAB\CONCEPT_SYNONYM.csv'
WITH (
FIRSTROW = 2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '0x0a',
ERRORFILE = 'C:\CDMV5VOCAB\CONCEPT_SYNONYM.bad',
TABLOCK
);

AlaikseiKatyshou · 2024-09-06T12:20:59Z

Hi @stpatnoe,

I think the error in this case is related to the fact that the field concept_synonym_name in the concept_synonym table has the VARCHAR(1000) type, while the concept 1251539 has a synonym name truncated to a length of 1000 and the name itself contains Unicode characters Ξ.
A field with the VARCHAR(1000) type is designed to store non-Unicode text 1000 bytes long, where each character is 1 byte.
While a 1000-character string containing Unicode will not fit in that field, because a Unicode character can take more than 1 byte.

As a solution to this issue, I would suggest changing the data type of the concept_synonym_name field from VARCHAR(1000) to NVARCHAR(1000), since NVARCHAR is designed to work with Unicode text.
@clairblacketer

cgreich · 2024-09-06T14:19:52Z

Somebody in the vocab team should remove that "synonym" altogether.

Alexdavv · 2024-09-14T14:39:18Z

Somebody in the vocab team should remove that "synonym" altogether.

Maybe, but it wouldn't solve the DDL issue.

stpatnoe · 2024-09-16T13:58:51Z

Switching to NVARCHAR(1000) did not resolve the issue for me. It looks like there are 54 concepts in the 'OMOP Invest Drug' vocabulary_id that have more than 1000 characters. I was able to use the bulk insert after updating the concept_synonym_name column to NVARCHAR(MAX):

ALTER TABLE CONCEPT_SYNONYM ALTER COLUMN concept_synonym_name nvarchar(max) not null

TRUNCATE TABLE CONCEPT_SYNONYM;
BULK INSERT CONCEPT_SYNONYM
FROM 'C:\CDMV5VOCAB\CONCEPT_SYNONYM.csv'
WITH (
FIRSTROW = 2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '0x0a',
ERRORFILE = 'C:\CDMV5VOCAB\CONCEPT_SYNONYM.bad',
TABLOCK
);

TinyRickC137 added the Drug source label Sep 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk load data conversion error (truncation) for concept_synonym_name #1047

Bulk load data conversion error (truncation) for concept_synonym_name #1047

stpatnoe commented Sep 5, 2024 •

edited

Loading

AlaikseiKatyshou commented Sep 6, 2024 •

edited

Loading

cgreich commented Sep 6, 2024

Alexdavv commented Sep 14, 2024

stpatnoe commented Sep 16, 2024

Bulk load data conversion error (truncation) for concept_synonym_name #1047

Bulk load data conversion error (truncation) for concept_synonym_name #1047

Comments

stpatnoe commented Sep 5, 2024 • edited Loading

AlaikseiKatyshou commented Sep 6, 2024 • edited Loading

cgreich commented Sep 6, 2024

Alexdavv commented Sep 14, 2024

stpatnoe commented Sep 16, 2024

stpatnoe commented Sep 5, 2024 •

edited

Loading

AlaikseiKatyshou commented Sep 6, 2024 •

edited

Loading