synBuildTable throws unexpected error

Description

I am trying to upload a large table (2 gb) as a table so that I can query it easily.

My workflow looks like this:

Download the csv: https://drugtargetcommons.fimm.fi/static/Excell_files/DTC_data.csv

Read into R:

 

1 2 3 4 5 6 library(synapser) synLogin() dt <- readr::read_csv('DtcDrugTargetInteractions.csv', guess_max = 1000000) tab <- synBuildTable("DTC Bioactivities", parent = 'syn20857049', values = dt)

This runs for a bit but then throws the following error:

 

1 2 3 > tab <- synBuildTable("DTC Bioactivities", parent = 'syn20857049', values = dt) Error in value[[3L]](cond) : Error tokenizing data. C error: Expected 33 fields in line 835530, saw 35

This is confusing, because the data frame (a tibble) only has 33 columns.

Row 835529 (ignoring the header...) looks like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 c(dt[835529,]) $compound_id[1] "CHEMBL1475514" $standard_inchi_key[1] "DBCSPALPUCNROA-UHFFFAOYSA-N" $compound_name[1] NA $synonym[1] NA $target_id[1] "Q8TBX8" $target_pref_name[1] "PHOSPHATIDYLINOSITOL-5-PHOSPHATE 4-KINASE TYPE-2 GAMMA" $gene_names[1] "PIP4K2C" $wildtype_or_mutant[1] NA $mutation_info[1] NA $pubmed_id[1] NA $standard_type[1] "Kd" $standard_relation[1] "=" $standard_value[1] 7940 $standard_units[1] "NM" $activity_comment[1] NA $ep_action_mode[1] NA $assay_format[1] NA $assaytype[1] NA $assay_subtype[1] NA $inhibitor_type[1] NA $detection_tech[1] NA $assay_cell_line[1] NA $compound_concentration_value[1] NA $compound_concentration_value_unit[1] NA $substrate_type[1] NA $substrate_relation[1] NA $substrate_value[1] NA $substrate_units[1] NA $assay_description[1] "Inhibition of PIP5K2C (unknown origin)" $title[1] "\\\"Compounds, pharmaceutical compositions, and methods of treating or preventing neurodegenerative diseases or disorders\\\"" $journal[1] NA $doc_type[1] "PATENT" $annotation_comments[1] NA

I'm guessing it's the 'title' column that is tripping up synapser, but regardless, it appears to be interpreting the data in that column differently than readr::read_csv is.

Environment

None

Status

Assignee

Unassigned

Reporter

Robert Allaway

Labels

None

Validator

Robert Allaway

Release Version History

None

Priority

Minor
Configure