Skip to content Skip to sidebar Skip to footer

What Is The Best Way To Iterate Over A Python List, Excluding Certain Values And Printing Out The Result

I am new to python and have a question: I have checked similar questions, checked the tutorial dive into python, checked the python documentation, googlebinging, similar Stack Over

Solution 1:

Try something like this:

def legit(string):
    if (string.startswith("Photo:") or"None" in string):
        returnFalseelse:
        returnTrue

whatyouwant = [x for x in data if legit(x)]

I'm not sure if this will work out of the box for your data, but you get the idea. If you're not familiar, [x for x in data if legit(x)] is called a list comprehension

Solution 2:

First of all, only add Tweet.get('text') if there is a text entry:

withopen ('output.txt') as fp:
    for line initer(fp.readline,''):   
        Tweets=json.loads(line)
        if'text'in Tweets:
            data.append(Tweets['text'])

That'll not add None entries (.get() returns None if the 'text' key is not present in the dictionary).

I'm assuming here that you want to further process the data list you are building here. If not, you can dispense with the for entry in data: loops below and stick to one loop with if statements. Tweets['text'] is the same value as entry in the for entry in data loops.

Next, you are looping over python unicode values, so use the methods provided on those objects to filter out what you don't want:

for entry in data:
    if not entry.startswith("Photo:"):
        print entry

You can use a list comprehension here; the following would print all entries too, in one go:

print'\n'.join([entry for entry in data if not entry.startswith("Photo:")])

In this case that doesn't really buy you much, as you are building one big string just to print it; you may as well just print the individual strings and avoid the string building cost.

Note that all your data is Unicode data. What you perhaps wanted is to filter out text that uses codepoints beyond ASCII points perhaps. You could use regular expressions to detect that there are codepoints beyond ASCII in your text

import re
nonascii = re.compile(ur'[^\x00-0x7f]', re.UNICODE)  # all codepoints beyond 0x7F are non-asciifor entry in data:
    if entry.startswith("Photo:") or nonascii.search(entry):
        continue# skip the rest of this iteration, continue to the nextprint entry

Short demo of the non-ASCII expression:

>>>import re>>>nonascii = re.compile(ur'[^\x00-\x7f]', re.UNICODE)>>>nonascii.search(u'All you see is ASCII')>>>nonascii.search(u'All you see is ASCII plus a little more unicode, like the EM DASH codepoint: \u2014')
<_sre.SRE_Match object at 0x1086275e0>

Solution 3:

withopen ('output.txt') as fp:
    for line in fp.readlines():
        Tweets=json.loads(line)
        ifnot'text'in Tweets: continue

        txt = Tweets.get('text')
        if txt.replace('.', '').replace('?','').replace(' ','').isalnum():
            data.append(txt)
            print txt

Small and simple. Basic principle, one loop, if data matches your "OK" criteria add it and print it.

As Martijn pointed out, 'text' might not be in all the Tweets data.


Regexp replacement for .replace() would go something along the lines of: if re.match('^[\w-\ ]+$', txt) is not None: (it will not work for blankspace etc so yea as mentioned below..)

Solution 4:

I'd suggest something like the following:

# use itertools.ifilter to remove items from a list according to a functionfrom itertools import ifilter
import re

# write a function to filter out entries you don't wantdefmy_filter(value):
    ifnot value or value.startswith('Photo:'):
        returnFalse# exclude unwanted charsif re.match('[^\x00-\x7F]', value):
        returnFalsereturnTrue# Reading the data can be simplified with a list comprehensionwithopen('output.txt') as fp:
    data = [json.loads(line).get('text') for line in fp]

# do the filtering
data = list(ifilter(my_filter, data))

# print the outputfor line in data:
    print line

Regarding unicode, assuming you're using python 2.x, the open function won't read data as unicode, it'll be read as the str type. You might want to convert it if you know the encoding, or read the file with a given encoding using codecs.open.

Solution 5:

Try this:

with open ('output.txt') as fp:
    for line in iter(fp.readline,''):   
        Tweets=json.loads(line)             
        data.append(Tweets.get('text'))
        i=0while i < len(data):
            # these conditions will skip (continue) over the iterations
            # matching your first two conditions.                         
            ifdata[i] == None or data[i].startswith("Photo"):
                continue
            print data[i] 
            i=i+1

Post a Comment for "What Is The Best Way To Iterate Over A Python List, Excluding Certain Values And Printing Out The Result"