Skip to content
This repository has been archived by the owner on Mar 30, 2023. It is now read-only.

Fixes Publishing Data to Elasticsearch #994

Merged
merged 2 commits into from
Oct 29, 2020
Merged

Fixes Publishing Data to Elasticsearch #994

merged 2 commits into from
Oct 29, 2020

Conversation

tweedge
Copy link
Contributor

@tweedge tweedge commented Oct 27, 2020

Fixes #963 due to bad types. For simplicity, I also moved this twint/storage/elasticsearch.py to use datetime, which is a more elegant solution (and already has precedent in twint/storage/db.py)

This change could break whatever twint/storage/panda.py used the hour function in elasticsearch.py for, which was simply moved into the one place it was used in panda.py as it was no longer needed and only used once. So, do test if you're interested in this PR, or tell me what you need me to do or whatever.

Hope this helps.

@rohanrajpal
Copy link

Thanks a lot for this PR, I was facing the same issue and this fixed it.

@overcyber
Copy link

New error using elasticsearch

Traceback (most recent call last):
File "/usr/local/bin/twint", line 11, in
load_entry_point('twint==2.1.21', 'console_scripts', 'twint')()
File "/home/usuario/.local/lib/python3.6/site-packages/twint/cli.py", line 313, in run_as_command
main()
File "/home/usuario/.local/lib/python3.6/site-packages/twint/cli.py", line 305, in main
run.Search(c)
File "/home/usuario/.local/lib/python3.6/site-packages/twint/run.py", line 427, in Search
run(config, callback)
File "/home/usuario/.local/lib/python3.6/site-packages/twint/run.py", line 319, in run
get_event_loop().run_until_complete(Twint(config).main(callback))
File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
return future.result()
File "/home/usuario/.local/lib/python3.6/site-packages/twint/run.py", line 239, in main
await task
File "/home/usuario/.local/lib/python3.6/site-packages/twint/run.py", line 290, in run
await self.tweets()
File "/home/usuario/.local/lib/python3.6/site-packages/twint/run.py", line 230, in tweets
await output.Tweets(tweet, self.config, self.conn)
File "/home/usuario/.local/lib/python3.6/site-packages/twint/output.py", line 175, in Tweets
await checkData(tweets, config, conn)
File "/home/usuario/.local/lib/python3.6/site-packages/twint/output.py", line 159, in checkData
elasticsearch.Tweet(tweet, config)
File "/home/usuario/.local/lib/python3.6/site-packages/twint/storage/elasticsearch.py", line 296, in Tweet
helpers.bulk(es, actions, chunk_size=2000, request_timeout=200)
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/helpers/actions.py", line 373, in bulk
for ok, item in streaming_bulk(client, actions, *args, **kwargs):
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/helpers/actions.py", line 303, in streaming_bulk
**kwargs
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/helpers/actions.py", line 230, in _process_bulk_chunk
for item in gen:
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/helpers/actions.py", line 171, in _process_bulk_chunk_success
raise BulkIndexError("%i document(s) failed to index." % len(errors), errors

@tweedge
Copy link
Contributor Author

tweedge commented Oct 28, 2020

That's an issue from your Elasticsearch endpoint, @overcyber

@overcyber
Copy link

Thank you. I realized, but do not know how to solve the Elastix. Can you help me ?

That's an issue from your Elasticsearch endpoint, @overcyber

@pielco11 pielco11 merged commit 2348211 into twintproject:master Oct 29, 2020
darvell pushed a commit to darvell/twint that referenced this pull request Nov 16, 2020
* Fix ES publishing

* Remove hour() from elasticsearch.py
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Error when using storage to elasticsearch
4 participants