bash - How to compare two columns from two different files, and add columns from file2 to file1 for multiple consecutive columns -


relatively new coding , using awk, apologise if silly question! need compare $3 in file 1 $3 in file 2, , if match, print line file 1 with corresponding line entry $10 in file 2. have command this,

awk 'nr==fnr{a[$3]=$10; next} a[$3] {print $0 "\t" a[$3]}' file2 file1

however, file2 has columns $10-$647, , need above 637 columns. there way loop this?

example file 1:

 1  715348  rs3131984   t   g   100 pass    ac=5008;af=1;an=5008;ns=2504;dp=16986;eas_af=1;amr_af=1;afr_af=1;eur_af=1;sas_af=1;aa=.|||;vt=snp   gt  1|1 1|1 1|1  1  723798  rs34882115  cag c   100 pass    ac=4012;af=0.801118;an=5008;ns=2504;dp=24752;eas_af=0.7946;amr_af=0.8775;afr_af=0.5416;eur_af=0.9602;sas_af=0.9407;vt=indel gt  1|1 1|1 1|1  1  723891  rs2977670   g   c   100 pass    ac=3906;af=0.779952;an=5008;ns=2504;dp=22718;eas_af=0.7917;amr_af=0.8689;afr_af=0.4849;eur_af=0.9483;sas_af=0.9305;aa=.|||;vt=snp   gt  1|1 1|1 1|1  1  729679  rs4951859   c   g   100 pass    ac=3205;af=0.639976;an=5008;ns=2504;dp=18762;eas_af=0.6875;amr_af=0.7536;afr_af=0.2905;eur_af=0.841;sas_af=0.7761;aa=.|||;vt=snp    gt  1|0 1|1 1|0  1  752566  rs3094315   g     100 pass    ac=3597;af=0.718251;an=5008;ns=2504;dp=21293;eas_af=0.8839;amr_af=0.804;afr_af=0.3873;eur_af=0.84;sas_af=0.8088;aa=.|||;vt=snp  gt  0|1 1|1 0|1  1  752721  rs3131972     g   100 pass    ac=3272;af=0.653355;an=5008;ns=2504;dp=22729;eas_af=0.7659;amr_af=0.7363;afr_af=0.2905;eur_af=0.839;sas_af=0.7781;aa=.|||;vt=snp    gt  0|1 1|1 0|1  1  754182  rs3131969     g   100 pass    ac=3398;af=0.678514;an=5008;ns=2504;dp=16315;eas_af=0.7331;amr_af=0.7565;afr_af=0.3525;eur_af=0.8718;sas_af=0.8088;aa=.|||;vt=snp   gt  0|1 1|1 0|1  1  754192  rs3131968     g   100 pass    ac=3398;af=0.678514;an=5008;ns=2504;dp=16981;eas_af=0.7331;amr_af=0.7565;afr_af=0.3525;eur_af=0.8718;sas_af=0.8088;aa=.|||;vt=snp   gt  0|1 1|1 0|1  1  754334  rs3131967   t   c   100 pass    ac=3427;af=0.684305;an=5008;ns=2504;dp=21917;eas_af=0.7629;amr_af=0.755;afr_af=0.3525;eur_af=0.8718;sas_af=0.8088;aa=.|||;vt=snp    gt  0|1 1|1 0|1  1  754503  rs3115859   g     100 pass    ac=3325;af=0.663938;an=5008;ns=2504;dp=19944;eas_af=0.7629;amr_af=0.7378;afr_af=0.3374;eur_af=0.839;sas_af=0.771;aa=.|||;vt=snp gt  0|1 1|1 0|1  1  754964  rs3131966   c   t   100 pass    ac=3322;af=0.663339;an=5008;ns=2504;dp=19476;eas_af=0.7629;amr_af=0.7378;afr_af=0.3366;eur_af=0.837;sas_af=0.771;aa=.|||;vt=snp gt  0|1 1|1 0|1  1  755887  rs3131964   c   g   100 pass    ac=4905;af=0.979433;an=5008;ns=2504;dp=22796;eas_af=1;amr_af=0.9914;afr_af=0.9304;eur_af=0.995;sas_af=1;aa=.|||;vt=snp  gt  1|1 1|1 1|1  1  755890  rs3115858     t   100 pass    ac=3763;af=0.751398;an=5008;ns=2504;dp=23185;eas_af=0.8839;amr_af=0.8242;afr_af=0.4539;eur_af=0.8728;sas_af=0.8405;aa=.|||;vt=snp   gt  0|1 1|1 0|1  1  756604  rs3131962     g   100 pass    ac=3746;af=0.748003;an=5008;ns=2504;dp=28270;eas_af=0.8829;amr_af=0.8242;afr_af=0.4501;eur_af=0.8698;sas_af=0.8323;aa=.|||;vt=snp   gt  0|1 1|1 0|1 

example file 2:

1   742429  rs3094315     g   .   .   .   gt  0/0 0/0 1   1011278 rs3737728   g     .   .   .   gt  0/0 0/1 1   1077546 rs9442380   c   t   .   .   .   gt  0/0 0/0 1   1084601 rs4970362   g     .   .   .   gt  0/0 0/1 1   1089205 rs9660710   c     .   .   .   gt  0/0 0/0 1   1300787 rs2765033   c   t   .   .   .   gt  0/0 0/1 1   756604  rs3131962     g   100 pass    ac=3746;af=0.748003;an=5008;ns=2504;dp=28270;eas_af=0.8829;amr_af=0.8242;afr_af=0.4501;eur_af=0.8698;sas_af=0.8323;aa=.|||;vt=snp   gt  0|1 1|1 1   1303878 rs2649588   t   c   .   .   .   gt  0/0 0/1 1   1695996 rs6603811   c   t   .   .   .   gt  0/0 0/0 1   1782971 rs10907192  g     .   .   .   gt  0/0 0/0 1   1878053 rs3820011   c     .   .   .   gt  0/1 0/1 1   1882185 rs2803291   c   t   .   .   .   gt  0/0 0/0 

is awk best way this? i'm not sure how make loops of sort. , explanations appreciated!

i do:

$ column_file1=`awk '{print nf}' file1 | tail -1` $ paste file1 file2 | awk -v c1=column_file1 '{if($3==$(3+c1)){for(i=1;i<=647;i++)if(i<=c1 || i>c1+10){printf "%s ", $i}; printf "\n"}}'   

paste join 2 files line line.

the first if in awk check if third field of 2 files match (consider now, first field of second file $(1+c1)). if condition true enter loop print (on same line -printf "%s ", $i-) field avoiding first 10 of second file (if(i<=c1 || i>c1+10)). once loop finished (so have printed line) go new line. if files structured (each field has same wide), can use print $0 , pipe output colrm.

if prefere use awk

copy following file

#!/bin/awk -f  {if(nr == fnr)     {         a[$3] = $10;         for(i=11;i<=647;i++){             a[$3] = a[$3] "\t" $i         };         next     }     else{         if($3 in a){print $0 "\t" a[$3]}     } } 

then run with

$ awk -f <name_of_file> file2 file1 

tell me if have problems

edit:

i forgot tail -1 in first example. works examples provided.


Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -