'How to search for a 'text' or 'number' in a csv file with Python AND if exists print only first and second column values to a new csv file

I want to do the following using Python.

Step-1: Read a specific third column on a csv file using Python.
Step-2: Create a list with values got from step-1
Step-3: Take the value of index[0], search in csv file, if present print the values of column 1 and 2 only to a new csv file(There are 6 columns). If Not presents just ignore and goto next search.

file1.csv:

Country,Location,number,letter,name,pup-name,null
a,ab,1,qw,abcd,test1,3
b,cd,1,df,efgh,test2,4
c,ef,2,er,fgh,test3,5
d,gh,3,sd,sds,test4,
e,ij,5,we,sdrt,test5,
f,kl,6,sc,asdf,test6,
g,mn,7,df,xcxc,test7,
h,op,8,gb,eretet,test8,
i,qr,8,df,hjjh,test9,

Python script written for this:

import csv
import time
from collections import defaultdict

columns = defaultdict(list)

with open('file1.csv') as f:
    reader = csv.reader(f)
    reader.next()
    for row in reader:
        for (i,v) in enumerate(row):
            columns[i].append(v)
#print(columns[2])

b=(columns[2])
for x in b[:]:
    time.sleep(1)
    print x

Output of above script:

MacBook-Pro:test_usr$ python csv_file.py 
1
1
2
3
5
6
7
8
8
MacBook-Pro:test_usr$

I am able to do the steps 1 and 2.

Please guide me on doing Step-3. That is how to search for text/string in csv file and if present how to extract only specific column values to a new csv file?

Output file should look like:

a,ab
b,cd
c,ef
d,gh
e,ij
f,kl
g,mn
h,op
i,qr

Note : Search string will be from another csv file. Please don't suggest the direct answer for printing values of column 1 and 2 directly.

FINAL CODE is looks this:

import csv
import time
from collections import defaultdict

columns = defaultdict(list)

with open('file1.csv') as f:
    reader = csv.reader(f)
    reader.next()
    for row in reader:
        for (i,v) in enumerate(row):
            columns[i].append(v)
            b=(columns[2])

            for x in b[:]:
                with open('file2.csv') as f, open('file3.csv', 'a') as g:
                    reader = csv.reader(f)
                    #next(reader, None) # discard the header
                    writer = csv.writer(g)
                    for row in reader:
                        if row[2] == x:
                            writer.writerow(row[:2])

file1.csv:

Country,Location,number,letter,name,pup-name,null
a,ab,1,qw,abcd,test1,3
b,cd,1,df,efgh,test2,4
c,ef,2,er,fgh,test3,5
d,gh,3,sd,sds,test4,
e,ij,5,we,sdrt,test5,
f,kl,6,sc,asdf,test6,
g,mn,7,df,xcxc,test7,
h,op,8,gb,eretet,test8,
i,qr,8,df,hjjh,test9,

file2.csv:

count,name,number,Type,status,Config Version,,IP1,port
1,bob,1,TRAFFIC,end,1.2,,1.1.1.1,1
2,john,1,TRAFFIC,end,2.1,,1.1.1.2,2
4,foo,2,TRAFFIC,end,1.1,,1.1.1.3,3
5.333333333,test,3,TRAFFIC,end,3.1,,1.1.1.4,4
6.833333333,raa,5,TRAFFIC,end,5.1,,1.1.1.5,5
8.333333333,kaa,6,TRAFFIC,end,7.1,,1.1.1.6,6
9.833333333,thaa,7,TRAFFIC,end,9.1,,1.1.1.7,7
11.33333333,paa,8,TRAFFIC,end,11.1,,1.1.1.8,8
12.83333333,maa,8,TRAFFIC,end,13.1,,1.1.1.9,9

If I run the above script, output of file3.csv:

1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
1,bob
2,john
.
.
.

Its goes like this in loop

But output should be like this:

count,name
1,bob,
2,john,
4,foo,
5.333333333,test,
6.833333333,raa,
8.333333333,kaa,
9.833333333,thaa,
11.33333333,paa,
12.83333333,maa,


Solution 1:[1]

I think you should reconsider your approach. You can achieve your goal simply by iterating over the CSV file, without creating intermediate dicts and lists..., and since you want to work with specific columns, you'll make your life easier and your code more readable by using DictReader and DictWriter

import csv
import time

search_string = "whatever"

with open('file1.csv', 'rb') as f, open('file2.csv', 'wb') as g:
    reader = csv.DictReader(f)
    c1, c2, c3, *_ = reader.fieldnames
    writer = csv.DictWriter(g, fieldnames=(c1, c2))
    for row in reader:
        if row[c3] == search_string:
            writer.writerow({c1:row[c1], c2:row[c2]})

Keep in mind that csv module will always return strings. You have to handle data-type conversions yourself, if you need them (I've left that out form above).

If you don't want to use DictReader/DictWriter, I suppose it is a little more verbose, and don't want a header in your output file:

with open('file1.csv') as f, open('file2.csv', 'w') as g:
    reader = csv.reader(f)
    next(reader, None) # discard the header
    writer = csv.writer(g)
    for row in reader:
        if row[2] == search_string:
            writer.writerow(row[:2])

Solution 2:[2]

That is how to search for text/string in csv file and if present how to extract only specific column values to a new csv file?

This is two questions.

First question: to search for text in a file, the simplest answer would be to read the file text into memory and look for the text. If you want to look for the text in a specific column of the csv you're reading in, you can use a DictReader to make life easy:

for row in reader:
  if search_target in row[header]:
    # found it!

Second question: One way to write specific columns to a new csv would be as follows:

keys = ["Country", "Location"]
new_rows = [{key: row[key] for key in keys} for row in reader]
writer = csv.DictWriter(somefile, keys)
writer.writerows(new_rows)

Solution 3:[3]

This may help to understand better. Reading two csv files and matching the row indexs values are same or not, If same, writing to another csv.

import numpy as np
import csv
import time
import os

output_dir = "D:\Laneending\data-ars540"
file1 = "3rd_test_rec_road_width_changing_scenarios_250_inference.csv"
file2 = "df_5_signals_1597515776730734.csv"

ars540 = os.path.join(output_dir, file1)
veh_dyn = os.path.join(output_dir, file2)

file3 = "df_5_signals_1597515776730734_processed.csv"
output_file = os.path.join(output_dir, file3)

with open(ars540, 'r') as f1, open(veh_dyn, 'r') as f2, \
        open(output_file, 'w+', newline='') as f3:
    f1_reader = csv.reader(f1)
    f2_reader = csv.reader(f2)
    header_f1 = []
    header_f1 = next(f1_reader) # reading the next line after header of csv file.

    header_f2 = []
    header_f2 = next(f2_reader) # reading the next line after header of csv file.
    count = 0
    writer = csv.writer(f3) #preparing the file f3 for writing the file.

    writer.writerow(["Timestamp", "no of detections", "velocity", "yawrate" , "afdr"])

    for row_f1 in f1_reader: # looking each row from csv file f1
        for row_f2 in f2_reader: # looking for each row from csv file f2
            if row_f1[1] == row_f2[0]: #checking the condition; worse case Time complexity o(n2)
                # print(row_f2)
                print(count)
                writer = csv.writer(f3)
                writer.writerows([row_f2])
                count +=1
                break

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Jon Kiparsky
Solution 3 prabhuiitdhn